We employ unsupervised machine learning to enhance the accuracy of our
recently presented scaling method for wave confinement analysis [1]. We employ
the standard k-means++ algorithm as well as our own model-based algorithm. We
investigate cluster validity indices as a means to find the correct number of
confinement dimensionalities to be used as an input to the clustering
algorithms. Subsequently, we analyze the performance of the two clustering
algorithms when compared to the direct application of the scaling method
without clustering. We find that the clustering approach provides more
physically meaningful results, but may struggle with identifying the correct
set of confinement dimensionalities. We conclude that the most accurate outcome
is obtained by first applying the direct scaling to find the correct set of
confinement dimensionalities and subsequently employing clustering to refine
the results. Moreover, our model-based algorithm outperforms the standard
k-means++ clustering.
( 2
min )
We identify and explore connections between the recent literature on
multi-group fairness for prediction algorithms and the pseudorandomness notions
of leakage-resilience and graph regularity. We frame our investigation using
new, statistical distance-based variants of multicalibration that are closely
related to the concept of outcome indistinguishability. Adopting this
perspective leads us naturally not only to our graph theoretic results, but
also to new, more efficient algorithms for multicalibration in certain
parameter regimes and a novel proof of a hardcore lemma for real-valued
functions.
( 2
min )
In the paper, we propose a novel approach for solving Bayesian inverse
problems with physics-informed invertible neural networks (PI-INN). The
architecture of PI-INN consists of two sub-networks: an invertible neural
network (INN) and a neural basis network (NB-Net). The invertible map between
the parametric input and the INN output with the aid of NB-Net is constructed
to provide a tractable estimation of the posterior distribution, which enables
efficient sampling and accurate density evaluation. Furthermore, the loss
function of PI-INN includes two components: a residual-based physics-informed
loss term and a new independence loss term. The presented independence loss
term can Gaussianize the random latent variables and ensure statistical
independence between two parts of INN output by effectively utilizing the
estimated density function. Several numerical experiments are presented to
demonstrate the efficiency and accuracy of the proposed PI-INN, including
inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations,
and seismic traveltime tomography.
( 2
min )
Previous studies have shown that leveraging domain index can significantly
boost domain adaptation performance (arXiv:2007.01807, arXiv:2202.03628).
However, such domain indices are not always available. To address this
challenge, we first provide a formal definition of domain index from the
probabilistic perspective, and then propose an adversarial variational Bayesian
framework that infers domain indices from multi-domain data, thereby providing
additional insight on domain relations and improving domain adaptation
performance. Our theoretical analysis shows that our adversarial variational
Bayesian framework finds the optimal domain index at equilibrium. Empirical
results on both synthetic and real data verify that our model can produce
interpretable domain indices which enable us to achieve superior performance
compared to state-of-the-art domain adaptation methods. Code is available at
https://github.com/Wang-ML-Lab/VDI.
( 2
min )
Modern machine learning systems are increasingly trained on large amounts of
data embedded in high-dimensional spaces. Often this is done without analyzing
the structure of the dataset. In this work, we propose a framework to study the
geometric structure of the data. We make use of our recently introduced
non-negative kernel (NNK) regression graphs to estimate the point density,
intrinsic dimension, and the linearity of the data manifold (curvature). We
further generalize the graph construction and geometric estimation to multiple
scale by iteratively merging neighborhoods in the input data. Our experiments
demonstrate the effectiveness of our proposed approach over other baselines in
estimating the local geometry of the data manifolds on synthetic and real
datasets.
( 2
min )
Motor brain-computer interface (BCI) development relies critically on neural
time series decoding algorithms. Recent advances in deep learning architectures
allow for automatic feature selection to approximate higher-order dependencies
in data. This article presents the FingerFlex model - a convolutional
encoder-decoder architecture adapted for finger movement regression on
electrocorticographic (ECoG) brain data. State-of-the-art performance was
achieved on a publicly available BCI competition IV dataset 4 with a
correlation coefficient between true and predicted trajectories up to 0.74. The
presented method provides the opportunity for developing fully-functional
high-precision cortical motor brain-computer interfaces.
( 2
min )
Hardware Trojans (HTs) are undesired design or manufacturing modifications
that can severely alter the security and functionality of digital integrated
circuits. HTs can be inserted according to various design criteria, e.g., nets
switching activity, observability, controllability, etc. However, to our
knowledge, most HT detection methods are only based on a single criterion,
i.e., nets switching activity. This paper proposes a multi-criteria
reinforcement learning (RL) HT detection tool that features a tunable reward
function for different HT detection scenarios. The tool allows for exploring
existing detection strategies and can adapt new detection scenarios with
minimal effort. We also propose a generic methodology for comparing HT
detection methods fairly. Our preliminary results show an average of 84.2%
successful HT detection in ISCAS-85 benchmark
( 2
min )
The proposed BSDE-based diffusion model represents a novel approach to
diffusion modeling, which extends the application of stochastic differential
equations (SDEs) in machine learning. Unlike traditional SDE-based diffusion
models, our model can determine the initial conditions necessary to reach a
desired terminal distribution by adapting an existing score function. We
demonstrate the theoretical guarantees of the model, the benefits of using
Lipschitz networks for score matching, and its potential applications in
various areas such as diffusion inversion, conditional diffusion, and
uncertainty quantification. Our work represents a contribution to the field of
score-based generative learning and offers a promising direction for solving
real-world problems.
( 2
min )
In this paper we present the Zeitview Rooftop Geometry (ZRG) dataset. ZRG
contains thousands of samples of high resolution orthomosaics of aerial imagery
of residential rooftops with corresponding digital surface models (DSM), 3D
rooftop wireframes, and multiview imagery generated point clouds for the
purpose of residential rooftop geometry and scene understanding. We perform
thorough benchmarks to illustrate the numerous applications unlocked by this
dataset and provide baselines for the tasks of roof outline extraction,
monocular height estimation, and planar roof structure extraction.
( 2
min )
We adapt reinforcement learning (RL) methods for continuous control to bridge
the gap between complete ignorance and perfect knowledge of the environment.
Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes
inspiration from both model-free RL and model-based control. It uses incomplete
information from a partial model and retains RL's data-driven adaption towards
optimal performance. The linear quadratic regulator provides a case study;
numerical experiments demonstrate the effectiveness and resulting benefits of
the proposed method.
( 2
min )
In this study, toward addressing the over-confident outputs of existing
artificial intelligence-based colorectal cancer (CRC) polyp classification
techniques, we propose a confidence-calibrated residual neural network.
Utilizing a novel vision-based tactile sensing (VS-TS) system and unique CRC
polyp phantoms, we demonstrate that traditional metrics such as accuracy and
precision are not sufficient to encapsulate model performance for handling a
sensitive CRC polyp diagnosis. To this end, we develop a residual neural
network classifier and address its over-confident outputs for CRC polyps
classification via the post-processing method of temperature scaling. To
evaluate the proposed method, we introduce noise and blur to the obtained
textural images of the VS-TS and test the model's reliability for non-ideal
inputs through reliability diagrams and other statistical metrics.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
The challenges faced by text classification with large tag systems in natural
language processing tasks include multiple tag systems, uneven data
distribution, and high noise. To address these problems, the ESimCSE
unsupervised comparative learning and UDA semi-supervised comparative learning
models are combined through the use of joint training techniques in the
models.The ESimCSE model efficiently learns text vector representations using
unlabeled data to achieve better classification results, while UDA is trained
using unlabeled data through semi-supervised learning methods to improve the
prediction performance of the models and stability, and further improve the
generalization ability of the model. In addition, adversarial training
techniques FGM and PGD are used in the model training process to improve the
robustness and reliability of the model. The experimental results show that
there is an 8% and 10% accuracy improvement relative to Baseline on the public
dataset Ruesters as well as on the operational dataset, respectively, and a 15%
improvement in manual validation accuracy can be achieved on the operational
dataset, indicating that the method is effective.
( 2
min )
We propose an experimental scheme for performing sensitive, high-precision
laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise
resonant ionization of the atoms travelling inside an electric field and
subsequently detecting the ion and the corresponding electron, time- and
position-sensitive measurements of the resulting particles can be performed.
Using a Mixture Density Network (MDN), we can leverage this information to
predict the initial energy of individual atoms and thus apply a Doppler
correction of the observed transition frequencies on an event-by-event basis.
We conduct numerical simulations of the proposed experimental scheme and show
that kHz-level uncertainties can be achieved for ion beams produced at extreme
temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and
non-uniform velocity distributions. The ability to perform in-flight
spectroscopy, directly on highly energetic beams, offers unique opportunities
to studying short-lived isotopes with lifetimes in the millisecond range and
below, produced in low quantities, in hot and highly contaminated environments,
without the need for cooling techniques. Such species are of marked interest
for nuclear structure, astrophysics, and new physics searches.
( 2
min )
In this paper, we introduce a new nonlinear channel equalization method for
the coherent long-haul transmission based on Transformers. We show that due to
their capability to attend directly to the memory across a sequence of symbols,
Transformers can be used effectively with a parallelized structure. We present
an implementation of encoder part of Transformer for nonlinear equalization and
analyze its performance over a wide range of different hyper-parameters. It is
shown that by processing blocks of symbols at each iteration and carefully
selecting subsets of the encoder's output to be processed together, an
efficient nonlinear compensation can be achieved. We also propose the use of a
physic-informed mask inspired by nonlinear perturbation theory for reducing the
computational complexity of Transformer nonlinear equalization.
( 2
min )
In this paper, we investigate the robustness of an LSTM neural network
against noise injection attacks for electric load forecasting in an ideal
microgrid. The performance of the LSTM model is investigated under a black-box
Gaussian noise attack with different SNRs. It is assumed that attackers have
just access to the input data of the LSTM model. The results show that the
noise attack affects the performance of the LSTM model. The load prediction
means absolute error (MAE) is 0.047 MW for a healthy prediction, while this
value increases up to 0.097 MW for a Gaussian noise insertion with SNR= 6 dB.
To robustify the LSTM model against noise attack, a low-pass filter with
optimal cut-off frequency is applied at the model's input to remove the noise
attack. The filter performs better in case of noise with lower SNR and is less
promising for small noises.
( 2
min )
The average treatment effect, which is the difference in expectation of the
counterfactuals, is probably the most popular target effect in causal inference
with binary treatments. However, treatments may have effects beyond the mean,
for instance decreasing or increasing the variance. We propose a new
kernel-based test for distributional effects of the treatment. It is, to the
best of our knowledge, the first kernel-based, doubly-robust test with provably
valid type-I error. Furthermore, our proposed algorithm is efficient, avoiding
the use of permutations.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We study Langevin-type algorithms for sampling from Gibbs distributions such
that the potentials are dissipative and their weak gradients have finite moduli
of continuity not necessarily convergent to zero. Our main result is a
non-asymptotic upper bound of the 2-Wasserstein distance between the Gibbs
distribution and the law of general Langevin-type algorithms based on the
Liptser--Shiryaev theory and Poincar\'{e} inequalities. We apply this bound to
show that the Langevin Monte Carlo algorithm can approximate Gibbs
distributions with arbitrary accuracy if the potentials are dissipative and
their gradients are uniformly continuous. We also propose Langevin-type
algorithms with spherical smoothing for potentials without convexity or
continuous differentiability.
( 2
min )
We describe a direct approach to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
The Hierarchical Vote Collective of Transformation-based Ensembles
(HIVE-COTE) is a heterogeneous meta ensemble for time series classification.
Since it was first proposed in 2016, the algorithm has undergone some minor
changes and there is now a configurable, scalable and easy to use version
available in two open source repositories. We present an overview of the latest
stable HIVE-COTE, version 1.0, and describe how it differs to the original. We
provide a walkthrough guide of how to use the classifier, and conduct extensive
experimental evaluation of its predictive performance and resource usage. We
compare the performance of HIVE-COTE to three recently proposed algorithms
using the aeon toolkit.
( 2
min )
Forecast reconciliation is an important research topic. Yet, there is
currently neither formal framework nor practical method for the probabilistic
reconciliation of count time series. In this paper we propose a definition of
coherency and reconciled probabilistic forecast which applies to both
real-valued and count variables and a novel method for probabilistic
reconciliation. It is based on a generalization of Bayes' rule and it can
reconcile both real-value and count variables. When applied to count variables,
it yields a reconciled probability mass function. Our experiments with the
temporal reconciliation of count variables show a major forecast improvement
compared to the probabilistic Gaussian reconciliation.
( 2
min )
In this work, we study the performance of the Thompson Sampling algorithm for
Contextual Bandit problems based on the framework introduced by Neu et al. and
their concept of lifted information ratio. First, we prove a comprehensive
bound on the Thompson Sampling expected cumulative regret that depends on the
mutual information of the environment parameters and the history. Then, we
introduce new bounds on the lifted information ratio that hold for sub-Gaussian
rewards, thus generalizing the results from Neu et al. which analysis requires
binary rewards. Finally, we provide explicit regret bounds for the special
cases of unstructured bounded contextual bandits, structured bounded contextual
bandits with Laplace likelihood, structured Bernoulli bandits, and bounded
linear contextual bandits.
( 2
min )
In this article, we will research the Recommender System's implementation
about how it works and the algorithms used. We will explain the Recommender
System's algorithms based on mathematical principles, and find feasible methods
for improvements. The algorithms based on probability have its significance in
Recommender System, we will describe how they help to increase the accuracy and
speed of the algorithms. Both the weakness and the strength of two different
mathematical distance used to describe the similarity will be detailed
illustrated in this article.
( 2
min )
Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. This post describes how to implement your first ML use case using Amazon […]
( 9
min )
Spotlighted by this week’s In the NVIDIA Studio featured artist Unmesh Dinda, NVIDIA Broadcast transforms the homes, apartments and dorm rooms of content creators, livestreamers and people working from home through the power of AI — all without the need for specialized equipment.
( 7
min )
Imagine a future where your vehicle’s interior offers personalized experiences and builds trust through human-machine interfaces (HMI) and AI. In this episode of the NVIDIA AI Podcast, Andreas Binner, chief technology officer at Rightware, delves into this fascinating topic with host Katie Burke Washabaugh. Rightware is a Helsinki-based company at the forefront of developing in-vehicle Read article >
( 5
min )
We recently introduced a new capability in the Amazon SageMaker Python SDK that lets data scientists run their machine learning (ML) code authored in their preferred integrated developer environment (IDE) and notebooks along with the associated runtime dependencies as Amazon SageMaker training jobs with minimal code changes to the experimentation done locally. Data scientists typically […]
( 13
min )
Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message […]
( 9
min )
Announcements Tech Layoffs and Uncertainty Raise Big Questions for Higher Education Mass layoffs continue across the tech industry, with tens of thousands of workers losing their jobs in the first quarter of 2023. The reductions occurred from small startups to the biggest names in tech — Google, Amazon, Microsoft. Core technical roles such as data… Read More »DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education
The post DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education appeared first on Data Science Central.
( 19
min )
Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications We have now entered the era when processor designers can leverage modular semiconductor manufacturing capabilities to speed frequently performed operations (such as small tensor operations) and offload a variety of housekeeping tasks (such as copying and zeroing memory) to dedicated on-chip accelerators. The… Read More »Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications
The post Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications appeared first on Data Science Central.
( 33
min )
Newly released open-source software can help developers guide generative AI applications to create impressive text responses that stay on track. NeMo Guardrails will help ensure smart applications powered by large language models (LLMs) are accurate, appropriate, on topic and secure. The software includes all the code, examples and documentation businesses need to add safety to Read article >
( 6
min )
ChatGPT users can now turn off chat history, allowing you to choose which conversations can be used to train our models.
( 2
min )
In the world of machine learning (ML), the quality of the dataset is of significant importance to model predictability. Although more data is usually better, large datasets with a high number of features can sometimes lead to non-optimal model performance due to the curse of dimensionality. Analysts can spend a significant amount of time transforming […]
( 9
min )
According to a PWC report, 32% of retail customers churn after one negative experience, and 73% of customers say that customer experience influences their purchase decisions. In the global retail industry, pre- and post-sales support are both important aspects of customer care. Numerous methods, including email, live chat, bots, and phone calls, are used to […]
( 8
min )
TLA+ is a high level, open-source, math-based language for modeling computer programs and systems–especially concurrent and distributed ones. It comes with tools to help eliminate fundamental design errors, which are hard to find and expensive to fix once they have been embedded in code or hardware. The TLA language was first published in 1993 by the […]
The post TLA+ Foundation aims to bring math-based software modeling to the mainstream appeared first on Microsoft Research.
( 9
min )
Along with Markov chain Monte Carlo (MCMC) methods, variational inference
(VI) has emerged as a central computational approach to large-scale Bayesian
inference. Rather than sampling from the true posterior $\pi$, VI aims at
producing a simple but effective approximation $\hat \pi$ to $\pi$ for which
summary statistics are easy to compute. However, unlike the well-studied MCMC
methodology, algorithmic guarantees for VI are still relatively less
well-understood. In this work, we propose principled methods for VI, in which
$\hat \pi$ is taken to be a Gaussian or a mixture of Gaussians, which rest upon
the theory of gradient flows on the Bures--Wasserstein space of Gaussian
measures. Akin to MCMC, it comes with strong theoretical guarantees when $\pi$
is log-concave.
( 2
min )
We consider using gradient descent to minimize the nonconvex function
$f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is
an underlying smooth convex cost function defined over $n\times n$ matrices.
While only a second-order stationary point $X$ can be provably found in
reasonable time, if $X$ is additionally rank deficient, then its rank
deficiency certifies it as being globally optimal. This way of certifying
global optimality necessarily requires the search rank $r$ of the current
iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the
global minimizer $X^{\star}$. Unfortunately, overparameterization significantly
slows down the convergence of gradient descent, from a linear rate with
$r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is
strongly convex. In this paper, we propose an inexpensive preconditioner that
restores the convergence rate of gradient descent back to linear in the
overparameterized case, while also making it agnostic to possible
ill-conditioning in the global minimizer $X^{\star}$.
( 2
min )
Neutron scattering experiments at three-axes spectrometers (TAS) investigate
magnetic and lattice excitations by measuring intensity distributions to
understand the origins of materials properties. The high demand and limited
availability of beam time for TAS experiments however raise the natural
question whether we can improve their efficiency and make better use of the
experimenter's time. In fact, there are a number of scientific problems that
require searching for signals, which may be time consuming and inefficient if
done manually due to measurements in uninformative regions. Here, we describe a
probabilistic active learning approach that not only runs autonomously, i.e.,
without human interference, but can also directly provide locations for
informative measurements in a mathematically sound and methodologically robust
way by exploiting log-Gaussian processes. Ultimately, the resulting benefits
can be demonstrated on a real TAS experiment and a benchmark including numerous
different excitations.
( 2
min )
Conservative inference is a major concern in simulation-based inference. It
has been shown that commonly used algorithms can produce overconfident
posterior approximations. Balancing has empirically proven to be an effective
way to mitigate this issue. However, its application remains limited to neural
ratio estimation. In this work, we extend balancing to any algorithm that
provides a posterior density. In particular, we introduce a balanced version of
both neural posterior estimation and contrastive neural ratio estimation. We
show empirically that the balanced versions tend to produce conservative
posterior approximations on a wide variety of benchmarks. In addition, we
provide an alternative interpretation of the balancing condition in terms of
the $\chi^2$ divergence.
( 2
min )
Recent breakthroughs in NLP largely increased the presence of ASR systems in
our daily lives. However, for many low-resource languages, ASR models still
need to be improved due in part to the difficulty of acquiring pertinent data.
This project aims to help advance research in ASR models for Swiss German
dialects, by providing insights about the performance of state-of-the-art ASR
models on recently published Swiss German speech datasets. We propose a novel
loss that takes into account the semantic distance between the predicted and
the ground-truth labels. We outperform current state-of-the-art results by
fine-tuning OpenAI's Whisper model on Swiss-German datasets.
( 2
min )
This article presents a leak localization methodology based on state
estimation and learning. The first is handled by an interpolation scheme,
whereas dictionary learning is considered for the second stage. The novel
proposed interpolation technique exploits the physics of the interconnections
between hydraulic heads of neighboring nodes in water distribution networks.
Additionally, residuals are directly interpolated instead of hydraulic head
values. The results of applying the proposed method to a well-known case study
(Modena) demonstrated the improvements of the new interpolation method with
respect to a state-of-the-art approach, both in terms of interpolation error
(considering state and residual estimation) and posterior localization.
( 2
min )
The use of machine learning (ML) inference for various applications is
growing drastically. ML inference services engage with users directly,
requiring fast and accurate responses. Moreover, these services face dynamic
workloads of requests, imposing changes in their computing resources. Failing
to right-size computing resources results in either latency service level
objectives (SLOs) violations or wasted computing resources. Adapting to dynamic
workloads considering all the pillars of accuracy, latency, and resource cost
is challenging. In response to these challenges, we propose InfAdapter, which
proactively selects a set of ML model variants with their resource allocations
to meet latency SLO while maximizing an objective function composed of accuracy
and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%,
respectively, compared to a popular industry autoscaler (Kubernetes Vertical
Pod Autoscaler).
( 2
min )
Deploying machine learning models in production may allow adversaries to
infer sensitive information about training data. There is a vast literature
analyzing different types of inference risks, ranging from membership inference
to reconstruction attacks. Inspired by the success of games (i.e.,
probabilistic experiments) to study security properties in cryptography, some
authors describe privacy inference risks in machine learning using a similar
game-based style. However, adversary capabilities and goals are often stated in
subtly different ways from one presentation to the other, which makes it hard
to relate and compose results. In this paper, we present a game-based framework
to systematize the body of knowledge on privacy inference risks in machine
learning. We use this framework to (1) provide a unifying structure for
definitions of inference risks, (2) formally establish known relations among
definitions, and (3) to uncover hitherto unknown relations that would have been
difficult to spot otherwise.
( 2
min )
Hyperparameter optimization (HPO) is crucial for strong performance of deep
learning algorithms and real-world applications often impose some constraints,
such as memory usage, or latency on top of the performance requirement. In this
work, we propose constrained TPE (c-TPE), an extension of the widely-used
versatile Bayesian optimization method, tree-structured Parzen estimator (TPE),
to handle these constraints. Our proposed extension goes beyond a simple
combination of an existing acquisition function and the original TPE, and
instead includes modifications that address issues that cause poor performance.
We thoroughly analyze these modifications both empirically and theoretically,
providing insights into how they effectively overcome these challenges. In the
experiments, we demonstrate that c-TPE exhibits the best average rank
performance among existing methods with statistical significance on 81
expensive HPO settings.
( 2
min )
How do you scale a machine learning product at a startup? In particular, how
do you serve a greater volume, velocity, and variety of queries
cost-effectively? We break down costs into variable costs-the cost of serving
the model and performant-and fixed costs-the cost of developing and training
new models. We propose a framework for conceptualizing these costs, breaking
them into finer categories, and limn ways to reduce costs. Lastly, since in our
experience, the most expensive fixed cost of a machine learning system is the
cost of identifying the root causes of failures and driving continuous
improvement, we present a way to conceptualize the issues and share our
methodology for the same.
( 2
min )
We introduce a novel self-attention mechanism, which we call CSA (Chromatic
Self-Attention), which extends the notion of attention scores to attention
_filters_, independently modulating the feature channels. We showcase CSA in a
fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which
integrates both graph structural information and edge features, completely
bypassing the need for local message-passing components. Our method flexibly
encodes graph structure through node-node interactions, by enriching the
original edge features with a relative positional encoding scheme. We propose a
new scheme based on random walks that encodes both structural and positional
information, and show how to incorporate higher-order topological information,
such as rings in molecular graphs. Our approach achieves state-of-the-art
results on the ZINC benchmark dataset, while providing a flexible framework for
encoding graph structure and incorporating higher-order topology.
( 2
min )
This article presents an identification benchmark based on data from a public
swimming pool in operation. Such a system is both a complex process and easily
understandable by all with regard to the stakes. Ultimately, the objective is
to reduce the energy bill while maintaining the level of quality of service.
This objective is general in scope and is not limited to public swimming pools.
This can be done effectively through what is known as economic predictive
control. This type of advanced control is based on a process model. It is the
aim of this article and the considered benchmark to show that such a dynamic
model can be obtained from operating data. For this, operational data is
formatted and shared, and model quality indicators are proposed. On this basis,
the first identification results illustrate the results obtained by a linear
multivariable model on the one hand, and by a neural dynamic model on the other
hand. The benchmark calls for other proposals and results from control and data
scientists for comparison.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
Exploration is a fundamental aspect of reinforcement learning (RL), and its
effectiveness crucially decides the performance of RL algorithms, especially
when facing sparse extrinsic rewards. Recent studies showed the effectiveness
of encouraging exploration with intrinsic rewards estimated from novelty in
observations. However, there is a gap between the novelty of an observation and
an exploration in general, because the stochasticity in the environment as well
as the behavior of an agent may affect the observation. To estimate exploratory
behaviors accurately, we propose DEIR, a novel method where we theoretically
derive an intrinsic reward from a conditional mutual information term that
principally scales with the novelty contributed by agent explorations, and
materialize the reward with a discriminative forward model. We conduct
extensive experiments in both standard and hardened exploration games in
MiniGrid to show that DEIR quickly learns a better policy than baselines. Our
evaluations in ProcGen demonstrate both generalization capabilities and the
general applicability of our intrinsic reward.
( 2
min )
Recent years have seen a rich literature of data-driven approaches designed
for power grid applications. However, insufficient consideration of domain
knowledge can impose a high risk to the practicality of the methods.
Specifically, ignoring the grid-specific spatiotemporal patterns (in load,
generation, and topology, etc.) can lead to outputting infeasible,
unrealizable, or completely meaningless predictions on new inputs. To address
this concern, this paper investigates real-world operational data to provide
insights into power grid behavioral patterns, including the time-varying
topology, load, and generation, as well as the spatial differences (in peak
hours, diverse styles) between individual loads and generations. Then based on
these observations, we evaluate the generalization risks in some existing ML
works causedby ignoring these grid-specific patterns in model design and
training.
( 2
min )
It is difficult to identify anomalies in time series, especially when there
is a lot of noise. Denoising techniques can remove the noise but this technique
can cause a significant loss of information. To detect anomalies in the time
series we have proposed an attention free conditional autoencoder (AF-CA). We
started from the autoencoder conditional model on which we added an
Attention-Free LSTM layer \cite{inzirillo2022attention} in order to make the
anomaly detection capacity more reliable and to increase the power of anomaly
detection. We compared the results of our Attention Free Conditional
Autoencoder with those of an LSTM Autoencoder and clearly improved the
explanatory power of the model and therefore the detection of anomaly in noisy
time series.
( 2
min )
This article measures how sparsity can make neural networks more robust to
membership inference attacks. The obtained empirical results show that sparsity
improves the privacy of the network, while preserving comparable performances
on the task at hand. This empirical study completes and extends existing
literature.
( 2
min )
In Part I of the series “Creating Healthy AI Utility Function: Importance of Diversity,” I talked about the importance of embracing conflict and diversity to create a Healthy AI Utility Function; that is, creating an AI Utility Function that continuously balances conflicting KPIs and metrics to deliver responsible and ethical outcomes. The AI Utility Function… Read More »Creating Healthy AI Utility Function: ChatGPT Example – Part II
The post Creating Healthy AI Utility Function: ChatGPT Example – Part II appeared first on Data Science Central.
( 21
min )
submitted by /u/Joffylad
[link] [comments]
( 43
min )
Just in last 1 year, top 0.1% saw their wealth increase by 6 trillion dollars, bigger than wealth of most countries. https://www.cnbc.com/amp/2022/04/01/richest-one-percent-gained-trillions-in-wealth-2021.html
submitted by /u/timesarewasting
[link] [comments]
( 43
min )
For those of you interested in diving into the future of AI with some of the worlds leading AI experts, my company is hosting this free virtual event.
Kris Hammond (advises the U.N. and White House on AI) and his Northwestern students built us a custom AI/deepfake chat bot that will actually be on the panel answering questions and engaging in discussion…talk about Black Mirror situations. It should get interesting.
For those getting into AI or that understand how important it is for remaining competitive in your career, you should def check it out.
Here’s a link: https://chicagoinnovation.com/events/ai-vs-iq/
submitted by /u/chickenfettuccine
[link] [comments]
( 43
min )
submitted by /u/cgwuaqueduct
[link] [comments]
( 43
min )
submitted by /u/saintshing
[link] [comments]
( 43
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
With the advances of IoT developments, copious sensor data are communicated
through wireless networks and create the opportunity of building Digital Twins
to mirror and simulate the complex physical world. Digital Twin has long been
believed to rely heavily on domain knowledge, but we argue that this leads to a
high barrier of entry and slow development due to the scarcity and cost of
human experts. In this paper, we propose Digital Twin Graph (DTG), a general
data structure associated with a processing framework that constructs digital
twins in a fully automated and domain-agnostic manner. This work represents the
first effort that takes a completely data-driven and (unconventional) graph
learning approach to addresses key digital twin challenges.
( 2
min )
This study proposes a deep learning model for the classification and
segmentation of brain tumors from magnetic resonance imaging (MRI) scans. The
classification model is based on the EfficientNetB1 architecture and is trained
to classify images into four classes: meningioma, glioma, pituitary adenoma,
and no tumor. The segmentation model is based on the U-Net architecture and is
trained to accurately segment the tumor from the MRI images. The models are
evaluated on a publicly available dataset and achieve high accuracy and
segmentation metrics, indicating their potential for clinical use in the
diagnosis and treatment of brain tumors.
( 2
min )
Questions remain on the robustness of data-driven learning methods when
crossing the gap from simulation to reality. We utilize weight anchoring, a
method known from continual learning, to cultivate and fixate desired behavior
in Neural Networks. Weight anchoring may be used to find a solution to a
learning problem that is nearby the solution of another learning problem.
Thereby, learning can be carried out in optimal environments without neglecting
or unlearning desired behavior. We demonstrate this approach on the example of
learning mixed QoS-efficient discrete resource scheduling with infrequent
priority messages. Results show that this method provides performance
comparable to the state of the art of augmenting a simulation environment,
alongside significantly increased robustness and steerability.
( 2
min )
This work brings the leading accuracy, sample efficiency, and robustness of
deep equivariant neural networks to the extreme computational scale. This is
achieved through a combination of innovative model architecture, massive
parallelization, and models and implementations optimized for efficient GPU
utilization. The resulting Allegro architecture bridges the accuracy-speed
tradeoff of atomistic simulations and enables description of dynamics in
structures of unprecedented complexity at quantum fidelity. To illustrate the
scalability of Allegro, we perform nanoseconds-long stable simulations of
protein dynamics and scale up to a 44-million atom structure of a complete,
all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer. We
demonstrate excellent strong scaling up to 100 million atoms and 70% weak
scaling to 5120 A100 GPUs.
( 2
min )
The K Nearest Neighbors (KNN) classifier is widely used in many fields such
as fingerprint-based localization or medicine. It determines the class
membership of unlabelled sample based on the class memberships of the K
labelled samples, the so-called nearest neighbors, that are closest to the
unlabelled sample. The choice of K has been the topic of various studies and
proposed KNN-variants. Yet no variant has been proven to outperform all other
variants. In this paper a new KNN-variant is proposed which ensures that the K
nearest neighbors are indeed close to the unlabelled sample and finds K along
the way. The proposed algorithm is tested and compared to the standard KNN in
theoretical scenarios and for indoor localization based on ion-mobility
spectrometry fingerprints. It achieves a higher classification accuracy than
the KNN in the tests, while requiring having the same computational demand.
( 2
min )
Kernel-based modal statistical methods include mode estimation, regression,
and clustering. Estimation accuracy of these methods depends on the kernel used
as well as the bandwidth. We study effect of the selection of the kernel
function to the estimation accuracy of these methods. In particular, we
theoretically show a (multivariate) optimal kernel that minimizes its
analytically-obtained asymptotic error criterion when using an optimal
bandwidth, among a certain kernel class defined via the number of its sign
changes.
( 2
min )
Quantum computation has a strong implication for advancing the current
limitation of machine learning algorithms to deal with higher data dimensions
or reducing the overall training parameters for a deep neural network model.
Based on a gate-based quantum computer, a parameterized quantum circuit was
designed to solve a model-free reinforcement learning problem with the deep-Q
learning method. This research has investigated and evaluated its potential.
Therefore, a novel PQC based on the latest Qiskit and PyTorch framework was
designed and trained to compare with a full-classical deep neural network with
and without integrated PQC. At the end of the research, the research draws its
conclusion and prospects on developing deep quantum learning in solving a maze
problem or other reinforcement learning problems.
( 2
min )
This paper presents two novel deterministic initialization procedures for
K-means clustering based on a modified crowding distance. The procedures, named
CKmeans and FCKmeans, use more crowded points as initial centroids.
Experimental studies on multiple datasets demonstrate that the proposed
approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy. The
effectiveness of CKmeans and FCKmeans is attributed to their ability to select
better initial centroids based on the modified crowding distance. Overall, the
proposed approach provides a promising alternative for improving K-means
clustering.
( 2
min )
We used survival analysis to quantify the impact of postdischarge evaluation
and management (E/M) services in preventing hospital readmission or death. Our
approach avoids a specific pitfall of applying machine learning to this
problem, which is an inflated estimate of the effect of interventions, due to
survivors bias -- where the magnitude of inflation may be conditional on
heterogeneous confounders in the population. This bias arises simply because in
order to receive an intervention after discharge, a person must not have been
readmitted in the intervening period. After deriving an expression for this
phantom effect, we controlled for this and other biases within an inherently
interpretable Bayesian survival framework. We identified case management
services as being the most impactful for reducing readmissions overall,
particularly for patients discharged to long term care facilities, with high
resource utilization in the quarter preceding admission.
( 2
min )
We study the impacts of business cycles on machine learning (ML) predictions.
Using the S&P 500 index, we find that ML models perform worse during most
recessions, and the inclusion of recession history or the risk-free rate does
not necessarily improve their performance. Investigating recessions where
models perform well, we find that they exhibit lower market volatility than
other recessions. This implies that the improved performance is not due to the
merit of ML methods but rather factors such as effective monetary policies that
stabilized the market. We recommend that ML practitioners evaluate their models
during both recessions and expansions.
( 2
min )
We propose a framework for descriptively analyzing sets of partial orders
based on the concept of depth functions. Despite intensive studies of depth
functions in linear and metric spaces, there is very little discussion on depth
functions for non-standard data types such as partial orders. We introduce an
adaptation of the well-known simplicial depth to the set of all partial orders,
the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a
comparison of machine learning algorithms based on multidimensional performance
measures. Concretely, we analyze the distribution of different classifier
performances over a sample of standard benchmark data sets. Our results
promisingly demonstrate that our approach differs substantially from existing
benchmarking approaches and, therefore, adds a new perspective to the vivid
debate on the comparison of classifiers.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
There isn’t a foolproof formula for building a successful digital firm — the risk of starting a business is high. There’s more to the frequently cited statistic that nine out of ten companies fail — a reason you should check out this step-by-step guide to starting a successful startup. The COVID-19 pandemic has put pressure… Read More »5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion
The post 5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion appeared first on Data Science Central.
( 21
min )
From climate modeling to endangered species conservation, developers, researchers and companies are keeping an AI on the environment with the help of NVIDIA technology. They’re using NVIDIA GPUs and software to track endangered African black rhinos, forecast the availability of solar energy in the U.K., build detailed climate models and monitor environmental disasters from satellite Read article >
( 7
min )
Content creators using Epic Games’ open, advanced real-time 3D creation tool, Unreal Engine, are now equipped with more features to bring their work to life with NVIDIA Omniverse, a platform for creating and operating metaverse applications. The Omniverse Connector for Unreal Engine’s 201.0 update brings significant enhancements to creative workflows using both open platforms. Streamlining Read article >
( 6
min )
What’s the difference between NVIDIA GeForce RTX 30 and 40 Series GPUs for gamers? To briefly set aside the technical specifications, the difference lies in the level of performance and capability each series offers. Both deliver great graphics. Both offer advanced new features driven by NVIDIA’s global AI revolution a decade ago. Either can power Read article >
( 6
min )
Batch inference is a common pattern where prediction requests are batched together on input, a job runs to process those requests against a trained model, and the output includes batch prediction responses that can then be consumed by other applications or business functions. Running batch use cases in production environments requires a repeatable process for […]
( 14
min )
The technology of MIT alumni-founded Hosta a.i. creates detailed property assessments from photos.
( 9
min )
submitted by /u/Ad3t0
[link] [comments]
( 42
min )
submitted by /u/DarkangelUK
[link] [comments]
( 43
min )
submitted by /u/Sparkvoltage
[link] [comments]
( 43
min )
Hello! Not sure if this is the right place to ask.
I am working on a startup, I was wondering what people think are some gaps in current machine learning infrastructure solutions like WandB, or Neptune.ai.
I'd love to know what people think are some missing features for products like these, or what completely new features they would like to see!
submitted by /u/spirited__tree
[link] [comments]
( 43
min )
Hi all,
Hope you are all well. Last time I posted about the fastLLaMa project on here, I had a lot of support from you guys and I really appreciated it. Motivated me to try random experiments and new things!
Thought I would give an update after a month.
Yesterday we added support to enable users to attach and detach LoRA adapters quickly during the runtime. This work was built on top of the original llama.cpp repo with some modifications that impact the adapter size (We are figuring out ways to reduce the adapter size through possible quantization).
We also built on top of our save load feature to enable quick context switching during run time! This should enable a single running instance to server multiple sessions.
We were also grateful for the feature requests from the last post a…
( 46
min )
More than 50 automotive companies around the world have deployed over 800 autonomous test vehicles powered by the NVIDIA DRIVE Hyperion automotive compute architecture, which has recently achieved new safety milestones. The latest NVIDIA DRIVE Hyperion architecture is based on the DRIVE Orin system-on-a-chip (SoC). Many NVIDIA DRIVE processes, as well as hardware and software Read article >
( 5
min )
GFN Thursday rolls up this week with a hot new deal for a GeForce NOW six-month Priority membership. Enjoy the cloud gaming service with seven new games to stream this week, including more favorites from Bandai Namco Europe and F1 2021 from Electronic Arts. Make Gaming a Priority Starting today, GeForce NOW is offering a Read article >
( 6
min )
NVIDIA today recognized a dozen partners for their work helping customers in Europe, the Middle East and Africa harness the power of AI across industries. At a virtual EMEA Partner Day event, which was hosted by the NVIDIA Partner Network (NPN) and drew more than 750 registrants, Partner of the Year awards were given to Read article >
( 6
min )
Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. With advancements in hardware design, a wide range of CPU- and GPU-based infrastructures are available to help you speed up inference performance. Also, you can build these ML systems with a combination of ML […]
( 11
min )
These tunable proteins could be used to create new materials with specific mechanical properties, like toughness or flexibility.
( 10
min )
This study introduces and investigates the capabilities of three different
text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet
Analysis, and Clustering Word Vectors, for automating code extraction from a
relatively small discussion board dataset. We compare the outputs of each
algorithm with a previous dataset that was manually coded by two human raters.
The results show that even with a relatively small dataset, automated
approaches can be an asset to course instructors by extracting some of the
discussion codes, which can be used in Epistemic Network Analysis.
( 2
min )
Mining data streams is one of the main studies in machine learning area due
to its application in many knowledge areas. One of the major challenges on
mining data streams is concept drift, which requires the learner to discard the
current concept and adapt to a new one. Ensemble-based drift detection
algorithms have been used successfully to the classification task but usually
maintain a fixed size ensemble of learners running the risk of needlessly
spending processing time and memory. In this paper we present improvements to
the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for
regression that employs social networks theory. In order to detect concept
drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show
improvements in accuracy, especially in concept drift situations and better
performance compared to other state-of-the-art algorithms in both real and
synthetic data.
( 2
min )
The quality of air is closely linked with the life quality of humans,
plantations, and wildlife. It needs to be monitored and preserved continuously.
Transportations, industries, construction sites, generators, fireworks, and
waste burning have a major percentage in degrading the air quality. These
sources are required to be used in a safe and controlled manner. Using
traditional laboratory analysis or installing bulk and expensive models every
few miles is no longer efficient. Smart devices are needed for collecting and
analyzing air data. The quality of air depends on various factors, including
location, traffic, and time. Recent researches are using machine learning
algorithms, big data technologies, and the Internet of Things to propose a
stable and efficient model for the stated purpose. This review paper focuses on
studying and compiling recent research in this field and emphasizes the Data
sources, Monitoring, and Forecasting models. The main objective of this paper
is to provide the astuteness of the researches happening to improve the various
aspects of air polluting models. Further, it casts light on the various
research issues and challenges also.
( 2
min )
Successful deployment of artificial intelligence (AI) in various settings has
led to numerous positive outcomes for individuals and society. However, AI
systems have also been shown to harm parts of the population due to biased
predictions. We take a closer look at AI fairness and analyse how lack of AI
fairness can lead to deepening of biases over time and act as a social
stressor. If the issues persist, it could have undesirable long-term
implications on society, reinforced by interactions with other risks. We
examine current strategies for improving AI fairness, assess their limitations
in terms of real-world deployment, and explore potential paths forward to
ensure we reap AI's benefits without harming significant parts of the society.
( 2
min )
Advances in mobile communication capabilities open the door for closer
integration of pre-hospital and in-hospital care processes. For example,
medical specialists can be enabled to guide on-site paramedics and can, in
turn, be supplied with live vitals or visuals. Consolidating such
performance-critical applications with the highly complex workings of mobile
communications requires solutions both reliable and efficient, yet easy to
integrate with existing systems. This paper explores the application of Deep
Deterministic Policy Gradient~(\ddpg) methods for learning a communications
resource scheduling algorithm with special regards to priority users. Unlike
the popular Deep-Q-Network methods, the \ddpg is able to produce
continuous-valued output. With light post-processing, the resulting scheduler
is able to achieve high performance on a flexible sum-utility goal.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
This paper introduces the QDQN-DPER framework to enhance the efficiency of
quantum reinforcement learning (QRL) in solving sequential decision tasks. The
framework incorporates prioritized experience replay and asynchronous training
into the training algorithm to reduce the high sampling complexities. Numerical
simulations demonstrate that QDQN-DPER outperforms the baseline distributed
quantum Q learning with the same model architecture. The proposed framework
holds potential for more complex tasks while maintaining training efficiency.
( 2
min )
We discuss the discontinuities that arise when mapping unordered objects to
neural network outputs of fixed permutation, referred to as the responsibility
problem. Prior work has proved the existence of the issue by identifying a
single discontinuity. Here, we show that discontinuities under such models are
uncountably infinite, motivating further research into neural networks for
unordered data.
( 2
min )
Prompt-based learning reformulates downstream tasks as cloze problems by
combining the original input with a template. This technique is particularly
useful in few-shot learning, where a model is trained on a limited amount of
data. However, the limited templates and text used in few-shot prompt-based
learning still leave significant room for performance improvement.
Additionally, existing methods using model ensembles can constrain the model
efficiency. To address these issues, we propose an augmentation method called
MixPro, which augments both the vanilla input text and the templates through
token-level, sentence-level, and epoch-level Mixup strategies. We conduct
experiments on five few-shot datasets, and the results show that MixPro
outperforms other augmentation baselines, improving model performance by an
average of 5.08% compared to before augmentation.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
Many machine learning methods assume that the training and test data follow
the same distribution. However, in the real world, this assumption is very
often violated. In particular, the phenomenon that the marginal distribution of
the data changes is called covariate shift, one of the most important research
topics in machine learning. We show that the well-known family of covariate
shift adaptation methods is unified in the framework of information geometry.
Furthermore, we show that parameter search for geometrically generalized
covariate shift adaptation method can be achieved efficiently. Numerical
experiments show that our generalization can achieve better performance than
the existing methods it encompasses.
( 2
min )
The recent advances in representation learning inspire us to take on the
challenging problem of unsupervised image classification tasks in a principled
way. We propose ContraCluster, an unsupervised image classification method that
combines clustering with the power of contrastive self-supervised learning.
ContraCluster consists of three stages: (1) contrastive self-supervised
pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3)
prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly
accurate, categorically prototypical images in an embedding space learned by
contrastive learning. We use sampled prototypes as noisy labeled data to
perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and
large unlabeled data to further enhance the accuracy. We demonstrate
empirically that ContraCluster achieves new state-of-the-art results for
standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For
example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which
outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin.
Without any labels, ContraCluster can achieve a 90.8% accuracy that is
comparable to 95.8% by the best supervised counterpart.
( 2
min )
Sea surface temperature (SST) is uniquely important to the Earth's atmosphere
since its dynamics are a major force in shaping local and global climate and
profoundly affect our ecosystems. Accurate forecasting of SST brings
significant economic and social implications, for example, better preparation
for extreme weather such as severe droughts or tropical cyclones months ahead.
However, such a task faces unique challenges due to the intrinsic complexity
and uncertainty of ocean systems. Recently, deep learning techniques, such as
graphical neural networks (GNN), have been applied to address this task. Even
though these methods have some success, they frequently have serious drawbacks
when it comes to investigating dynamic spatiotemporal dependencies between
signals. To solve this problem, this paper proposes a novel static and dynamic
learnable personalized graph convolution network (SD-LPGC). Specifically, two
graph learning layers are first constructed to respectively model the stable
long-term and short-term evolutionary patterns hidden in the multivariate SST
signals. Then, a learnable personalized convolution layer is designed to fuse
this information. Our experiments on real SST datasets demonstrate the
state-of-the-art performances of the proposed approach on the forecasting task.
( 2
min )
Federated Learning (FL) aims to train a machine learning (ML) model in a
distributed fashion to strengthen data privacy with limited data migration
costs. It is a distributed learning framework naturally suitable for
privacy-sensitive medical imaging datasets. However, most current FL-based
medical imaging works assume silos have ground truth labels for training. In
practice, label acquisition in the medical field is challenging as it often
requires extensive labor and time costs. To address this challenge and leverage
the unannotated data silos to improve modeling, we propose an alternate
training-based framework, Federated Alternate Training (FAT), that alters
training between annotated data silos and unannotated data silos. Annotated
data silos exploit annotations to learn a reasonable global segmentation model.
Meanwhile, unannotated data silos use the global segmentation model as a target
model to generate pseudo labels for self-supervised learning. We evaluate the
performance of the proposed framework on two naturally partitioned Federated
datasets, KiTS19 and FeTS2021, and show its promising performance.
( 2
min )
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people,
being more inclined towards the population above 60 years. Leveraging
wearable-systems to find accurate biomarkers for diagnosis has become the need
of the hour, especially for a neurodegenerative condition like Parkinson's.
This work aims at focusing on early-occurring, common symptoms, such as motor
and gait related parameters to arrive at a quantitative analysis on the
feasibility of an economical and a robust wearable device. A subset of the
Parkinson's Progression Markers Initiative (PPMI), PPMI Gait dataset has been
utilised for feature-selection after a thorough analysis with various Machine
Learning algorithms. Identified influential features has then been used to test
real-time data for early detection of Parkinson Syndrome, with a model accuracy
of 91.9%
( 2
min )
We apply Bayesian optimization and reinforcement learning to a problem in
topology: the question of when a knot bounds a ribbon disk. This question is
relevant in an approach to disproving the four-dimensional smooth Poincar\'e
conjecture; using our programs, we rule out many potential counterexamples to
the conjecture. We also show that the programs are successful in detecting many
ribbon knots in the range of up to 70 crossings.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
Spatiotemporal (ST) data collected by sensors can be represented as
multi-variate time series, which is a sequence of data points listed in an
order of time. Despite the vast amount of useful information, the ST data
usually suffer from the issue of missing or incomplete data, which also limits
its applications. Imputation is one viable solution and is often used to
prepossess the data for further applications. However, in practice, n practice,
spatiotemporal data imputation is quite difficult due to the complexity of
spatiotemporal dependencies with dynamic changes in the traffic network and is
a crucial prepossessing task for further applications. Existing approaches
mostly only capture the temporal dependencies in time series or static spatial
dependencies. They fail to directly model the spatiotemporal dependencies, and
the representation ability of the models is relatively limited.
( 2
min )
Running complex sets of machine learning experiments is challenging and
time-consuming due to the lack of a unified framework. This leaves researchers
forced to spend time implementing necessary features such as parallelization,
caching, and checkpointing themselves instead of focussing on their project. To
simplify the process, in this paper, we introduce Memento, a Python package
that is designed to aid researchers and data scientists in the efficient
management and execution of computationally intensive experiments. Memento has
the capacity to streamline any experimental pipeline by providing a
straightforward configuration matrix and the ability to concurrently run
experiments across multiple threads. A demonstration of Memento is available
at: https://wickerlab.org/publication/memento.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
In this work we establish an algorithm and distribution independent
non-asymptotic trade-off between the model size, excess test loss, and training
loss of linear predictors. Specifically, we show that models that perform well
on the test data (have low excess loss) are either "classical" -- have training
loss close to the noise level, or are "modern" -- have a much larger number of
parameters compared to the minimum needed to fit the training data exactly.
We also provide a more precise asymptotic analysis when the limiting spectral
distribution of the whitened features is Marchenko-Pastur. Remarkably, while
the Marchenko-Pastur analysis is far more precise near the interpolation peak,
where the number of parameters is just enough to fit the training data, it
coincides exactly with the distribution independent bound as the level of
overparametrization increases.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
submitted by /u/LiveFromChabougamou
[link] [comments]
( 42
min )
Repo: https://github.com/h2oai/h2ogpt
From the repo:
- Open-source repository with fully permissive, commercially usable code, data and models
- Code for preparing large open-source datasets as instruction datasets for fine-tuning of large language models (LLMs), including prompt engineering
- Code for fine-tuning large language models (currently up to 20B parameters) on commodity hardware and enterprise GPU servers (single or multi node)
- Code to run a chatbot on a GPU server, with shareable end-point with Python client API
- Code to evaluate and compare the performance of fine-tuned LLMs
submitted by /u/luizluiz
[link] [comments]
( 43
min )
Code & Demo: https://github.com/z-x-yang/Segment-and-Track-Anything
https://reddit.com/link/12rne1j/video/kepu2xsg9tua1/player
WebUI App is also available
https://preview.redd.it/s8uub4ii9tua1.png?width=1371&format=png&auto=webp&s=0bc91232439543fe911679d0df5fb27565b56a77
submitted by /u/liulei-li
[link] [comments]
( 43
min )
The ability to effectively handle and process enormous amounts of documents has become essential for enterprises in the modern world. Due to the continuous influx of information that all enterprises deal with, manually classifying documents is no longer a viable option. Document classification models can automate the procedure and help organizations save time and resources. […]
( 10
min )
Businesses are increasingly using machine learning (ML) to make near-real-time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing products and services. ML models make predictions given a set of input data known as features, and data scientists easily spend more than 60% of their time designing and […]
( 15
min )
This is a guest post co-written with Fred Wu from Sportradar. Sportradar is the world’s leading sports technology company, at the intersection between sports, media, and betting. More than 1,700 sports federations, media outlets, betting operators, and consumer platforms across 120 countries rely on Sportradar knowhow and technology to boost their business. Sportradar uses data […]
( 10
min )
MIT researchers exhibit a new advancement in autonomous drone navigation, using brain-inspired liquid neural networks that excel in out-of-distribution scenarios.
( 9
min )
Shanghai is once again showing why it’s called the “Magic City” as more than 1,000 exhibitors from 20 countries dazzle the automotive world this week at the highly anticipated International Automobile Industry Exhibition. With nearly 1,500 vehicles on display, the 20th edition of Auto Shanghai is showcasing the newest AI-powered cars and mobility solutions using Read article >
( 8
min )
This week’s In the NVIDIA Studio artists specializing in 3D, Gianluca Squillace and Pasquale Scionti, benefitted from just that — in their individual work and in collaborating to construct the final scene for their project, Cold Inside Diorama.
( 7
min )
For many people, opening door handles or moving a pen between their fingers is a movement that happens multiple times a day, often without much thought. For a robot, however, these movements aren’t always so easy. In reinforcement learning, robots learn to perform tasks by exploring their environments, receiving signals along the way that indicate […]
The post Unifying learning from preferences and demonstration via a ranking game for imitation learning appeared first on Microsoft Research.
( 15
min )
I develop a simplest traffic simulator of have five cars, I want improve the ability of cars's dirve using basic reinforcement learning skill.
I used tkinter to render and display the maps, But I found that tkinter can't support maps that have more than 20 row and columns in my person machine(Mac M1 mini), I don't know how to display bigger maps that have more rows and columns.
I'm very grateful that if you have some suggestion.
github repositories: https://github.com/wa008/reinforcement-learning
submitted by /u/waa007
[link] [comments]
( 42
min )
In this paper, we introduce four main novelties: First, we present a new way
of handling the topology problem of normalizing flows. Second, we describe a
technique to enforce certain classes of boundary conditions onto normalizing
flows. Third, we introduce the I-Spline bijection, which, similar to previous
work, leverages splines but, in contrast to those works, can be made
arbitrarily often differentiable. And finally, we use these techniques to
create Waveflow, an Ansatz for the one-space-dimensional multi-particle
fermionic wave functions in real space based on normalizing flows, that can be
efficiently trained with Variational Quantum Monte Carlo without the need for
MCMC nor estimation of a normalization constant. To enforce the necessary
anti-symmetry of fermionic wave functions, we train the normalizing flow only
on the fundamental domain of the permutation group, which effectively reduces
it to a boundary value problem.
( 2
min )
The article reviews significant advances in networked signal and information
processing, which have enabled in the last 25 years extending decision making
and inference, optimization, control, and learning to the increasingly
ubiquitous environments of distributed agents. As these interacting agents
cooperate, new collective behaviors emerge from local decisions and actions.
Moreover, and significantly, theory and applications show that networked
agents, through cooperation and sharing, are able to match the performance of
cloud or federated solutions, while offering the potential for improved
privacy, increasing resilience, and saving resources.
( 2
min )
This paper proposes a novel centralized training and distributed execution
(CTDE)-based multi-agent deep reinforcement learning (MADRL) method for
multiple unmanned aerial vehicles (UAVs) control in autonomous mobile access
applications. For the purpose, a single neural network is utilized in
centralized training for cooperation among multiple agents while maximizing the
total quality of service (QoS) in mobile access applications.
( 2
min )
Consumer's privacy is a main concern in Smart Grids (SGs) due to the
sensitivity of energy data, particularly when used to train machine learning
models for different services. These data-driven models often require huge
amounts of data to achieve acceptable performance leading in most cases to
risks of privacy leakage. By pushing the training to the edge, Federated
Learning (FL) offers a good compromise between privacy preservation and the
predictive performance of these models. The current paper presents an overview
of FL applications in SGs while discussing their advantages and drawbacks,
mainly in load forecasting, electric vehicles, fault diagnoses, load
disaggregation and renewable energies. In addition, an analysis of main design
trends and possible taxonomies is provided considering data partitioning, the
communication topology, and security mechanisms. Towards the end, an overview
of main challenges facing this technology and potential future directions is
presented.
( 2
min )
This paper presents the approach and results of USC SAIL's submission to the
Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting
relapses in psychotic patients. Relapse prediction has proven to be
challenging, primarily due to the heterogeneity of symptoms and responses to
treatment between individuals. We address these challenges by investigating the
use of sleep behavior features to estimate relapse days as outliers in an
unsupervised machine learning setting. We extract informative features from
human activity and heart rate data collected in the wild, and evaluate various
combinations of feature types and time resolutions. We found that short-time
sleep behavior features outperformed their awake counterparts and larger time
intervals. Our submission was ranked 3rd in the Task's official leaderboard,
demonstrating the potential of such features as an objective and non-invasive
predictor of psychotic relapses.
( 2
min )
Fetal standard scan plane detection during 2-D mid-pregnancy examinations is
a highly complex task, which requires extensive medical knowledge and years of
training. Although deep neural networks (DNN) can assist inexperienced
operators in these tasks, their lack of transparency and interpretability limit
their application. Despite some researchers have been committed to visualizing
the decision process of DNN, most of them only focus on the pixel-level
features and do not take into account the medical prior knowledge. In this
work, we propose an interpretable framework based on key medical concepts,
which provides explanations from the perspective of clinicians' cognition.
Moreover, we utilize a concept-based graph convolutional neural(GCN) network to
construct the relationships between key medical concepts. Extensive
experimental analysis on a private dataset has shown that the proposed method
provides easy-to-understand insights about reasoning results for clinicians.
( 2
min )
Self-supervised monocular depth estimation approaches suffer not only from
scale ambiguity but also infer temporally inconsistent depth maps w.r.t. scale.
While disambiguating scale during training is not possible without some kind of
ground truth supervision, having scale consistent depth predictions would make
it possible to calculate scale once during inference as a post-processing step
and use it over-time. With this as a goal, a set of temporal consistency losses
that minimize pose inconsistencies over time are introduced. Evaluations show
that introducing these constraints not only reduces depth inconsistencies but
also improves the baseline performance of depth and ego-motion prediction.
( 2
min )
In this paper, we primarily focus on understanding the data preprocessing
pipeline for DNN Training in the public cloud. First, we run experiments to
test the performance implications of the two major data preprocessing methods
using either raw data or record files. The preliminary results show that data
preprocessing is a clear bottleneck, even with the most efficient software and
hardware configuration enabled by NVIDIA DALI, a high-optimized data
preprocessing library. Second, we identify the potential causes, exercise a
variety of optimization methods, and present their pros and cons. We hope this
work will shed light on the new co-design of ``data storage, loading pipeline''
and ``training framework'' and flexible resource configurations between them so
that the resources can be fully exploited and performance can be maximized.
( 2
min )
In this paper, we extends original Neural Collapse Phenomenon by proving
Generalized Neural Collapse hypothesis. We obtain Grassmannian Frame structure
from the optimization and generalization of classification. This structure
maximally separates features of every two classes on a sphere and does not
require a larger feature dimension than the number of classes. Out of curiosity
about the symmetry of Grassmannian Frame, we conduct experiments to explore if
models with different Grassmannian Frames have different performance. As a
result, we discover the Symmetric Generalization phenomenon. We provide a
theorem to explain Symmetric Generalization of permutation. However, the
question of why different directions of features can lead to such different
generalization is still open for future investigation.
( 2
min )
Robotic grasping in highly noisy environments presents complex challenges,
especially with limited prior knowledge about the scene. In particular,
identifying good grasping poses with Bayesian inference becomes difficult due
to two reasons: i) generating data from uninformative priors proves to be
inefficient, and ii) the posterior often entails a complex distribution defined
on a Riemannian manifold. In this study, we explore the use of implicit
representations to construct scene-dependent priors, thereby enabling the
application of efficient simulation-based Bayesian inference algorithms for
determining successful grasp poses in unstructured environments. Results from
both simulation and physical benchmarks showcase the high success rate and
promising potential of this approach.
( 2
min )
In this paper, we describe a method for estimating the joint probability
density from data samples by assuming that the underlying distribution can be
decomposed as a mixture of product densities with few mixture components. Prior
works have used such a decomposition to estimate the joint density from
lower-dimensional marginals, which can be estimated more reliably with the same
number of samples. We combine two key ideas: dictionaries to represent 1-D
densities, and random projections to estimate the joint distribution from 1-D
marginals, explored separately in prior work. Our algorithm benefits from
improved sample complexity over the previous dictionary-based approach by using
1-D marginals for reconstruction. We evaluate the performance of our method on
estimating synthetic probability densities and compare it with the previous
dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm
outperforms these other approaches in all the experimental settings.
( 2
min )
This study presents a benchmark for evaluating action-constrained
reinforcement learning (RL) algorithms. In action-constrained RL, each action
taken by the learning system must comply with certain constraints. These
constraints are crucial for ensuring the feasibility and safety of actions in
real-world systems. We evaluate existing algorithms and their novel variants
across multiple robotics control environments, encompassing multiple action
constraint types. Our evaluation provides the first in-depth perspective of the
field, revealing surprising insights, including the effectiveness of a
straightforward baseline approach. The benchmark problems and associated code
utilized in our experiments are made available online at
github.com/omron-sinicx/action-constrained-RL-benchmark for further research
and development.
( 2
min )
Trained computer vision models are assumed to solve vision tasks by imitating
human behavior learned from training labels. Most efforts in recent vision
research focus on measuring the model task performance using standardized
benchmarks. Limited work has been done to understand the perceptual difference
between humans and machines. To fill this gap, our study first quantifies and
analyzes the statistical distributions of mistakes from the two sources. We
then explore human vs. machine expertise after ranking tasks by difficulty
levels. Even when humans and machines have similar overall accuracies, the
distribution of answers may vary. Leveraging the perceptual difference between
humans and machines, we empirically demonstrate a post-hoc human-machine
collaboration that outperforms humans or machines alone.
( 2
min )
We present LTC-SE, an improved version of the Liquid Time-Constant (LTC)
neural network algorithm originally proposed by Hasani et al. in 2021. This
algorithm unifies the Leaky-Integrate-and-Fire (LIF) spiking neural network
model with Continuous-Time Recurrent Neural Networks (CTRNNs), Neural Ordinary
Differential Equations (NODEs), and bespoke Gated Recurrent Units (GRUs). The
enhancements in LTC-SE focus on augmenting flexibility, compatibility, and code
organization, targeting the unique constraints of embedded systems with limited
computational resources and strict performance requirements. The updated code
serves as a consolidated class library compatible with TensorFlow 2.x, offering
comprehensive configuration options for LTCCell, CTRNN, NODE, and CTGRU
classes. We evaluate LTC-SE against its predecessors, showcasing the advantages
of our optimizations in user experience, Keras function compatibility, and code
clarity. These refinements expand the applicability of liquid neural networks
in diverse machine learning tasks, such as robotics, causality analysis, and
time-series prediction, and build on the foundational work of Hasani et al.
( 2
min )
Modern deep models for summarization attains impressive benchmark
performance, but they are prone to generating miscalibrated predictive
uncertainty. This means that they assign high confidence to low-quality
predictions, leading to compromised reliability and trustworthiness in
real-world applications. Probabilistic deep learning methods are common
solutions to the miscalibration problem. However, their relative effectiveness
in complex autoregressive summarization tasks are not well-understood. In this
work, we thoroughly investigate different state-of-the-art probabilistic
methods' effectiveness in improving the uncertainty quality of the neural
summarization models, across three large-scale benchmarks with varying
difficulty. We show that the probabilistic methods consistently improve the
model's generation and uncertainty quality, leading to improved selective
generation performance (i.e., abstaining from low-quality summaries) in
practice. We also reveal notable failure patterns of probabilistic methods
widely-adopted in NLP community (e.g., Deep Ensemble and Monte Carlo Dropout),
cautioning the importance of choosing appropriate method for the data setting.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
The Linear-Quadratic Regulation (LQR) problem with unknown system parameters
has been widely studied, but it has remained unclear whether $\tilde{
\mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can
be achieved almost surely. In this paper, we propose an adaptive LQR controller
with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The
controller features a circuit-breaking mechanism, which circumvents potential
safety breach and guarantees the convergence of the system parameter estimate,
but is shown to be triggered only finitely often and hence has negligible
effect on the asymptotic performance of the controller. The proposed controller
is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly
used industrial process example.
( 2
min )
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent.
( 2
min )
Chen et al. [Chen2022] recently published the article 'Fast and scalable
search of whole-slide images via self-supervised deep learning' in Nature
Biomedical Engineering. The authors call their method 'self-supervised image
search for histology', short SISH. We express our concerns that SISH is an
incremental modification of Yottixel, has used MinMax binarization but does not
cite the original works, and is based on a misnomer 'self-supervised image
search'. As well, we point to several other concerns regarding experiments and
comparisons performed by Chen et al.
( 2
min )
Adaptation-relevant predictions of climate change are often derived by
combining climate model simulations in a multi-model ensemble. Model evaluation
methods used in performance-based ensemble weighting schemes have limitations
in the context of high-impact extreme events. We introduce a locally
time-invariant method for evaluating climate model simulations with a focus on
assessing the simulation of extremes. We explore the behaviour of the proposed
method in predicting extreme heat days in Nairobi and provide comparative
results for eight additional cities.
( 2
min )
Enabling resilient autonomous motion planning requires robust predictions of
surrounding road users' future behavior. In response to this need and the
associated challenges, we introduce our model titled MTP-GO. The model encodes
the scene using temporal graph neural networks to produce the inputs to an
underlying motion model. The motion model is implemented using neural ordinary
differential equations where the state-transition functions are learned with
the rest of the model. Multimodal probabilistic predictions are obtained by
combining the concept of mixture density networks and Kalman filtering. The
results illustrate the predictive capabilities of the proposed model across
various data sets, outperforming several state-of-the-art methods on a number
of metrics.
( 2
min )
Nowadays, face recognition systems surpass human performance on several
datasets. However, there are still edge cases that the machine can't correctly
classify. This paper investigates the effect of a combination of machine and
human operators in the face verification task. First, we look closer at the
edge cases for several state-of-the-art models to discover common datasets'
challenging settings. Then, we conduct a study with 60 participants on these
selected tasks with humans and provide an extensive analysis. Finally, we
demonstrate that combining machine and human decisions can further improve the
performance of state-of-the-art face verification systems on various benchmark
datasets. Code and data are publicly available on GitHub.
( 2
min )
Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for
sampling from probability distributions. This paper provides a finite sample
analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD)
designed to achieve inverse reinforcement learning. By "passive", we mean that
the noisy gradients available to the PSGLD algorithm (inverse learning process)
are evaluated at randomly chosen points by an external stochastic gradient
algorithm (forward learner). The PSGLD algorithm thus acts as a randomized
sampler which recovers the cost function being optimized by this external
process. Previous work has analyzed the asymptotic performance of this passive
algorithm using stochastic approximation techniques; in this work we analyze
the non-asymptotic performance. Specifically, we provide finite-time bounds on
the 2-Wasserstein distance between the passive algorithm and its stationary
measure, from which the reconstructed cost function is obtained.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
There is an increasing interest in the development of new data-driven models
useful to assess the performance of communication networks. For many
applications, like network monitoring and troubleshooting, a data model is of
little use if it cannot be interpreted by a human operator. In this paper, we
present an extension of the Multivariate Big Data Analysis (MBDA) methodology,
a recently proposed interpretable data analysis tool. In this extension, we
propose a solution to the automatic derivation of features, a cornerstone step
for the application of MBDA when the amount of data is massive. The resulting
network monitoring approach allows us to detect and diagnose disparate network
anomalies, with a data-analysis workflow that combines the advantages of
interpretable and interactive models with the power of parallel processing. We
apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based
real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and
largest Wi-Fi trace known to date.
( 2
min )
Data - https://github.com/allenai/mmc4
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/Phaen_
[link] [comments]
( 45
min )
submitted by /u/Express_Turn_5489
[link] [comments]
( 56
min )
Data warehouses are at the heart of any organization’s technology ecosystem. The emergence of cloud technology has enabled data warehouses to offer capabilities such as cost-effective data storage, scalable computing and storage, utilization-based pricing, and fully managed service delivery. As data consumption increases and more people live and work remotely, companies are adopting modern data… Read More »Why It’s Important to Change Misconceptions About Data Warehouse Technology
The post Why It’s Important to Change Misconceptions About Data Warehouse Technology appeared first on Data Science Central.
( 21
min )
Three years after the outbreak of the COVID-19 pandemic, the lingering impacts of the viral outbreak and the risk of another deadly pathogen spreading around the world remain. The pandemic challenged every health system in the world, stressing facilities, medical equipment suppliers, and medical personnel. Public health authorities tracked disease transmission, modeled forecasts across multiple… Read More »How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic
The post How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic appeared first on Data Science Central.
( 21
min )
Artificial Intelligence (AI) is sweeping the globe, leaving no stone unturned as it reshapes industries far and wide.
The post Harnessing the Power of OpenAI Technology: 5 Innovative Marketing Tools appeared first on Data Science Central.
( 20
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML), enabling organizations to provide relevant information to customers and employees, when they need it. Amazon Kendra uses ML algorithms to enable users to use natural language queries to search for information scattered across multiple data souces in an enterprise, including commonly used document […]
( 7
min )
This post was co-written with Dave Gowel, CEO of RallyPoint. In his own words, “RallyPoint is an online social and professional network for veterans, service members, family members, caregivers, and other civilian supporters of the US armed forces. With two million members on the platform, the company provides a comfortable place for this deserving population […]
( 9
min )
Reliability managers and technicians in industrial environments such as manufacturing production lines, warehouses, and industrial plants are keen to improve equipment health and uptime to maximize product output and quality. Machine and process failures are often addressed by reactive activity after incidents happen or by costly preventive maintenance, where you run the risk of over-maintaining […]
( 16
min )
In the first two blog posts in this series, we presented our vision for Cloud Intelligence/AIOps (AIOps) research, and scenarios where innovations in AI technologies can help build and operate complex cloud platforms and services effectively and efficiently at scale. In this blog post, we dive deeper into our efforts to automatically manage large-scale cloud […]
The post Automatic post-deployment management of cloud applications appeared first on Microsoft Research.
( 15
min )
Sparked by the release of large AI models like AlexaTM, GPT, OpenChatKit, BLOOM, GPT-J, GPT-NeoX, FLAN-T5, OPT, Stable Diffusion, and ControlNet, the popularity of generative AI has seen a recent boom. Businesses are beginning to evaluate new cutting-edge applications of the technology in text, image, audio, and video generation that have the potential to revolutionize […]
( 18
min )
“Instead of focusing on the code, companies should focus on developing systematic engineering practices for improving data in ways that are reliable, efficient, and systematic. In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving […]
( 10
min )
As more businesses increase their online presence to serve their customers better, new fraud patterns are constantly emerging. In today’s ever-evolving digital landscape, where fraudsters are becoming more sophisticated in their tactics, detecting and preventing such fraudulent activities has become paramount for companies and financial institutions. Traditional rule-based fraud detection systems are capped in their […]
( 9
min )
RStudio on Amazon SageMaker is the industry’s first fully managed RStudio Workbench integrated development environment (IDE) in the cloud. You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. […]
( 7
min )
The dask release 2023.2.1 , introduced a new shuffling method called P2P for dask.dataframe, making sorts, merges, and joins faster and using constant memory. This article describes the problem, the new solution, and the impact on performance.
https://medium.com/coiled-hq/shuffling-large-data-at-constant-memory-in-dask-bb683e92d70b
submitted by /u/dask-jeeves
[link] [comments]
( 43
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
At the Hannover Messe trade show this week, Siemens unveiled a digital model of next-generation FREYR Battery factories that was developed using NVIDIA technology. The model was created in part to highlight a strategic partnership announced Monday by Siemens and FREYR, with Siemens becoming FREYR’s preferred supplier in automation technology, enabling the Norway-based group to Read article >
( 5
min )
Microsoft has made significant contributions to the prestigious USENIX NSDI’23 conference, which brings together experts in computer networks and distributed systems. A silver sponsor for the conference, Microsoft is a leader in developing innovative technologies for networking, and we are proud to have contributed to 30 papers accepted this year. Our team members also served […]
The post Microsoft at NSDI 2023: A commitment to advancing networking and distributed systems appeared first on Microsoft Research.
( 13
min )
This work addresses large dimensional covariance matrix estimation with
unknown mean. The empirical covariance estimator fails when dimension and
number of samples are proportional and tend to infinity, settings known as
Kolmogorov asymptotics. When the mean is known, Ledoit and Wolf (2004) proposed
a linear shrinkage estimator and proved its convergence under those
asymptotics. To the best of our knowledge, no formal proof has been proposed
when the mean is unknown. To address this issue, we propose a new estimator and
prove its quadratic convergence under the Ledoit and Wolf assumptions. Finally,
we show empirically that it outperforms other standard estimators.
( 2
min )
We present a novel approach for black-box VI that bypasses the difficulties
of stochastic gradient ascent, including the task of selecting step-sizes. Our
approach involves using a sequence of sample average approximation (SAA)
problems. SAA approximates the solution of stochastic optimization problems by
transforming them into deterministic ones. We use quasi-Newton methods and line
search to solve each deterministic optimization problem and present a heuristic
policy to automate hyperparameter selection. Our experiments show that our
method simplifies the VI problem and achieves faster performance than existing
methods.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
PAC-Bayes learning is an established framework to assess the generalisation
ability of learning algorithm during the training phase. However, it remains
challenging to know whether PAC-Bayes is useful to understand, before training,
why the output of well-known algorithms generalise well. We positively answer
this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly
introduced in \cite{amit2022ipm}. We provide new generalisation bounds
exploiting geometric assumptions on the loss function. Using our framework, we
prove, before any training, that the output of an algorithm from
\citet{lambert2022variational} has a strong asymptotic generalisation ability.
More precisely, we show that it is possible to incorporate optimisation results
within a generalisation framework, building a bridge between PAC-Bayes and
optimisation algorithms.
( 2
min )
Ultrasound is the primary modality to examine fetal growth during pregnancy,
while the image quality could be affected by various factors. Quality
assessment is essential for controlling the quality of ultrasound images to
guarantee both the perceptual and diagnostic values. Existing automated
approaches often require heavy structural annotations and the predictions may
not necessarily be consistent with the assessment results by human experts.
Furthermore, the overall quality of a scan and the correlation between the
quality of frames should not be overlooked. In this work, we propose a
reinforcement learning framework powered by two hierarchical agents that
collaboratively learn to perform both frame-level and video-level quality
assessments. It is equipped with a specially-designed reward mechanism that
considers temporal dependency among frame quality and only requires sparse
binary annotations to train. Experimental results on a challenging fetal brain
dataset verify that the proposed framework could perform dual-level quality
assessment and its predictions correlate well with the subjective assessment
results.
( 2
min )
This paper considers the problem of testing the maximum in-degree of the
Bayes net underlying an unknown probability distribution $P$ over $\{0,1\}^n$,
given sample access to $P$. We show that the sample complexity of the problem
is $\tilde{\Theta}(2^{n/2}/\varepsilon^2)$. Our algorithm relies on a
testing-by-learning framework, previously used to obtain sample-optimal
testers; in order to apply this framework, we develop new algorithms for
``near-proper'' learning of Bayes nets, and high-probability learning under
$\chi^2$ divergence, which are of independent interest.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
Machine learning algorithms, both in their classical and quantum versions,
heavily rely on optimization algorithms based on gradients, such as gradient
descent and alike. The overall performance is dependent on the appearance of
local minima and barren plateaus, which slow-down calculations and lead to
non-optimal solutions. In practice, this results in dramatic computational and
energy costs for AI applications. In this paper we introduce a generic strategy
to accelerate and improve the overall performance of such methods, allowing to
alleviate the effect of barren plateaus and local minima. Our method is based
on coordinate transformations, somehow similar to variational rotations, adding
extra directions in parameter space that depend on the cost function itself,
and which allow to explore the configuration landscape more efficiently. The
validity of our method is benchmarked by boosting a number of quantum machine
learning algorithms, getting a very significant improvement in their
performance.
( 2
min )
Edge computing solutions that enable the extraction of high level information
from a variety of sensors is in increasingly high demand. This is due to the
increasing number of smart devices that require sensory processing for their
application on the edge. To tackle this problem, we present a smart vision
sensor System on Chip (Soc), featuring an event-based camera and a low power
asynchronous spiking Convolutional Neuronal Network (sCNN) computing
architecture embedded on a single chip. By combining both sensor and processing
on a single die, we can lower unit production costs significantly. Moreover,
the simple end-to-end nature of the SoC facilitates small stand-alone
applications as well as functioning as an edge node in a larger systems. The
event-driven nature of the vision sensor delivers high-speed signals in a
sparse data stream. This is reflected in the processing pipeline, focuses on
optimising highly sparse computation and minimising latency for 9 sCNN layers
to $3.36\mu s$. Overall, this results in an extremely low-latency visual
processing pipeline deployed on a small form factor with a low energy budget
and sensor cost. We present the asynchronous architecture, the individual
blocks, the sCNN processing principle and benchmark against other sCNN capable
processors.
( 3
min )
With the increasing penetration of renewable power sources such as wind and
solar, accurate short-term, nowcasting renewable power prediction is becoming
increasingly important. This paper investigates the multi-modal (MM) learning
and end-to-end (E2E) learning for nowcasting renewable power as an intermediate
to energy management systems. MM combines features from all-sky imagery and
meteorological sensor data as two modalities to predict renewable power
generation that otherwise could not be combined effectively. The combined,
predicted values are then input to a differentiable optimal power flow (OPF)
formulation simulating the energy management. For the first time, MM is
combined with E2E training of the model that minimises the expected total
system cost. The case study tests the proposed methodology on the real sky and
meteorological data from the Netherlands. In our study, the proposed MM-E2E
model reduced system cost by 30% compared to uni-modal baselines.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
We consider the problem of synthetically generating data that can closely
resemble human decisions made in the context of an interactive human-AI system
like a computer game. We propose a novel algorithm that can generate synthetic,
human-like, decision making data while starting from a very small set of
decision making data collected from humans. Our proposed algorithm integrates
the concept of reward shaping with an imitation learning algorithm to generate
the synthetic data. We have validated our synthetic data generation technique
by using the synthetically generated data as a surrogate for human interaction
data to solve three sequential decision making tasks of increasing complexity
within a small computer game-like setup. Different empirical and statistical
analyses of our results show that the synthetically generated data can
substitute the human data and perform the game-playing tasks almost
indistinguishably, with very low divergence, from a human performing the same
tasks.
( 2
min )
Deep neural networks (DNNs) have been shown to be vulnerable to adversarial
examples. Moreover, the transferability of the adversarial examples has
received broad attention in recent years, which means that adversarial examples
crafted by a surrogate model can also attack unknown models. This phenomenon
gave birth to the transfer-based adversarial attacks, which aim to improve the
transferability of the generated adversarial examples. In this paper, we
propose to improve the transferability of adversarial examples in the
transfer-based attack via masking unimportant parameters (MUP). The key idea in
MUP is to refine the pretrained surrogate models to boost the transfer-based
attack. Based on this idea, a Taylor expansion-based metric is used to evaluate
the parameter importance score and the unimportant parameters are masked during
the generation of adversarial examples. This process is simple, yet can be
naturally combined with various existing gradient-based optimizers for
generating adversarial examples, thus further improving the transferability of
the generated adversarial examples. Extensive experiments are conducted to
validate the effectiveness of the proposed MUP-based methods.
( 2
min )
submitted by /u/davidbun
[link] [comments]
( 47
min )
submitted by /u/davidmezzetti
[link] [comments]
( 43
min )
submitted by /u/Daviewayne
[link] [comments]
( 42
min )
submitted by /u/TheExtimate
[link] [comments]
( 44
min )
submitted by /u/yescatbug
[link] [comments]
( 50
min )
submitted by /u/colabDog
[link] [comments]
( 44
min )
submitted by /u/spenny972
[link] [comments]
( 47
min )
submitted by /u/Artem_Bayankin
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This paper studies the problem of online performance optimization of
constrained closed-loop control systems, where both the objective and the
constraints are unknown black-box functions affected by exogenous time-varying
contextual disturbances. A primal-dual contextual Bayesian optimization
algorithm is proposed that achieves sublinear cumulative regret with respect to
the dynamic optimal solution under certain regularity conditions. Furthermore,
the algorithm achieves zero time-average constraint violation, ensuring that
the average value of the constraint function satisfies the desired constraint.
The method is applied to both sampled instances from Gaussian processes and a
continuous stirred tank reactor parameter tuning problem; simulation results
show that the method simultaneously provides close-to-optimal performance and
maintains constraint feasibility on average. This contrasts current
state-of-the-art methods, which either suffer from large cumulative regret or
severe constraint violations for the case studies presented.
( 2
min )
Deploying deep learning models in real-world certified systems requires the
ability to provide confidence estimates that accurately reflect their
uncertainty. In this paper, we demonstrate the use of the conformal prediction
framework to construct reliable and trustworthy predictors for detecting
railway signals. Our approach is based on a novel dataset that includes images
taken from the perspective of a train operator and state-of-the-art object
detectors. We test several conformal approaches and introduce a new method
based on conformal risk control. Our findings demonstrate the potential of the
conformal prediction framework to evaluate model performance and provide
practical guidance for achieving formally guaranteed uncertainty bounds.
( 2
min )
This paper clarifies why bias cannot be completely mitigated in Machine
Learning (ML) and proposes an end-to-end methodology to translate the ethical
principle of justice and fairness into the practice of ML development as an
ongoing agreement with stakeholders. The pro-ethical iterative process
presented in the paper aims to challenge asymmetric power dynamics in the
fairness decision making within ML design and support ML development teams to
identify, mitigate and monitor bias at each step of ML systems development. The
process also provides guidance on how to explain the always imperfect
trade-offs in terms of bias to users.
( 2
min )
In this paper, we consider the problem of learning a neural network
controller for a system required to satisfy a Signal Temporal Logic (STL)
specification. We exploit STL quantitative semantics to define a notion of
robust satisfaction. Guaranteeing the correctness of a neural network
controller, i.e., ensuring the satisfaction of the specification by the
controlled system, is a difficult problem that received a lot of attention
recently. We provide a general procedure to construct a set of trainable High
Order Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulas
in a fragment of STL. We use the BarrierNet, implemented by a differentiable
Quadratic Program (dQP) with HOCBF constraints, as the last layer of the neural
network controller, to guarantee the satisfaction of the STL formulas. We train
the HOCBFs together with other neural network parameters to further improve the
robustness of the controller. Simulation results demonstrate that our approach
ensures satisfaction and outperforms existing algorithms.
( 2
min )
Over the past decade, neural network (NN)-based controllers have demonstrated
remarkable efficacy in a variety of decision-making tasks. However, their
black-box nature and the risk of unexpected behaviors and surprising results
pose a challenge to their deployment in real-world systems with strong
guarantees of correctness and safety. We address these limitations by
investigating the transformation of NN-based controllers into equivalent soft
decision tree (SDT)-based controllers and its impact on verifiability.
Differently from previous approaches, we focus on discrete-output NN
controllers including rectified linear unit (ReLU) activation functions as well
as argmax operations. We then devise an exact but cost-effective transformation
algorithm, in that it can automatically prune redundant branches. We evaluate
our approach using two benchmarks from the OpenAI Gym environment. Our results
indicate that the SDT transformation can benefit formal verification, showing
runtime improvements of up to 21x and 2x for MountainCar-v0 and CartPole-v0,
respectively.
( 2
min )
submitted by /u/v1ll3_m
[link] [comments]
( 49
min )
submitted by /u/urqlite
[link] [comments]
( 42
min )
submitted by /u/Otarih
[link] [comments]
( 42
min )
submitted by /u/BEEFDATHIRD
[link] [comments]
( 43
min )
The GeForce RTX 4070 GPU, the latest in the 40 Series lineup, is available today starting at $599. It comes backed by NVIDIA Studio technologies, including hardware acceleration for 3D, video and AI workflows; optimizations for RTX hardware in over 110 popular creative apps; and exclusive NVIDIA Studio apps like Omniverse, Broadcast, Canvas and RTX Remix.
( 9
min )
A new adventure with publisher Bandai Namco Europe kicks off this GFN Thursday. Some of its popular titles lead seven new games joining the cloud this week. Plus, gamers can play them on more devices than ever, with native 4K streaming for GeForce NOW available on select LG Smart TVs. Better Together Bandai Namco is Read article >
( 6
min )
The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of scalable compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are transforming their businesses. Just recently, generative AI applications like ChatGPT have captured widespread attention and imagination. We […]
( 15
min )
Amazon CodeWhisperer is an AI coding companion that helps improve developer productivity by generating code recommendations based on their comments in natural language and code in the integrated development environment (IDE). CodeWhisperer accelerates completion of coding tasks by reducing context-switches between the IDE and documentation or developer forums. With real-time code recommendations from CodeWhisperer, you […]
( 6
min )
Over the past few years, large knowledge bases have been constructed to store
massive amounts of knowledge. However, these knowledge bases are highly
incomplete, for example, over 70% of people in Freebase have no known place of
birth. To solve this problem, we propose a query-driven knowledge base
completion system with multimodal fusion of unstructured and structured
information. To effectively fuse unstructured information from the Web and
structured information in knowledge bases to achieve good performance, our
system builds multimodal knowledge graphs based on question answering and rule
inference. We propose a multimodal path fusion algorithm to rank candidate
answers based on different paths in the multimodal knowledge graphs, achieving
much better performance than question answering, rule inference and a baseline
fusion algorithm. To improve system efficiency, query-driven techniques are
utilized to reduce the runtime of our system, providing fast responses to user
queries. Extensive experiments have been conducted to demonstrate the
effectiveness and efficiency of our system.
( 2
min )
Foundation models have taken over natural language processing and image
generation domains due to the flexibility of prompting. With the recent
introduction of the Segment Anything Model (SAM), this prompt-driven paradigm
has entered image segmentation with a hitherto unexplored abundance of
capabilities. The purpose of this paper is to conduct an initial evaluation of
the out-of-the-box zero-shot capabilities of SAM for medical image
segmentation, by evaluating its performance on an abdominal CT organ
segmentation task, via point or bounding box based prompting. We show that SAM
generalizes well to CT data, making it a potential catalyst for the advancement
of semi-automatic segmentation tools for clinicians. We believe that this
foundation model, while not reaching state-of-the-art segmentation performance
in our investigations, can serve as a highly potent starting point for further
adaptations of such models to the intricacies of the medical domain. Keywords:
medical image segmentation, SAM, foundation models, zero-shot learning
( 2
min )
Brain-inspired hyperdimensional computing (HDC) has been recently considered
a promising learning approach for resource-constrained devices. However,
existing approaches use static encoders that are never updated during the
learning process. Consequently, it requires a very high dimensionality to
achieve adequate accuracy, severely lowering the encoding and training
efficiency. In this paper, we propose DistHD, a novel dynamic encoding
technique for HDC adaptive learning that effectively identifies and regenerates
dimensions that mislead the classification and compromise the learning quality.
Our proposed algorithm DistHD successfully accelerates the learning process and
achieves the desired accuracy with considerably lower dimensionality.
( 2
min )
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random
variables (the vertices); a Bayesian Network Distribution (BND) is a
probability distribution on the random variables that is Markovian on the
graph. A finite $k$-mixture of such models is graphically represented by a
larger graph which has an additional "hidden" (or "latent") random variable
$U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other
vertex. Models of this type are fundamental to causal inference, where $U$
models an unobserved confounding effect of multiple populations, obscuring the
causal relationships in the observable DAG. By solving the mixture problem and
recovering the joint probability distribution on $U$, traditionally
unidentifiable causal relationships become identifiable. Using a reduction to
the more well-studied "product" case on empty graphs, we give the first
algorithm to learn mixtures of non-empty DAGs.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Statistical optimality benchmarking is crucial for analyzing and designing
time series classification (TSC) algorithms. This study proposes to benchmark
the optimality of TSC algorithms in distinguishing diffusion processes by the
likelihood ratio test (LRT). The LRT is an optimal classifier by the
Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because
the LRT does not need training, and the diffusion processes can be efficiently
simulated and are flexible to reflect the specific features of real-world
applications. We demonstrate the benchmarking with three widely-used TSC
algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the
LRT optimality for univariate time series and multivariate Gaussian processes.
However, these model-agnostic algorithms are suboptimal in classifying
high-dimensional nonlinear multivariate time series. Additionally, the LRT
benchmark provides tools to analyze the dependence of classification accuracy
on the time length, dimension, temporal sampling frequency, and randomness of
the time series.
( 2
min )
The convergence rates for convex and non-convex optimization methods depend
on the choice of a host of constants, including step sizes, Lyapunov function
constants and momentum constants. In this work we propose the use of factorial
powers as a flexible tool for defining constants that appear in convergence
proofs. We list a number of remarkable properties that these sequences enjoy,
and show how they can be applied to convergence proofs to simplify or improve
the convergence rates of the momentum method, accelerated gradient and the
stochastic variance reduced method (SVRG).
( 2
min )
submitted by /u/rowancheung
[link] [comments]
( 43
min )
Experts convene to peek under the hood of AI-generated code, language, and images as well as its capabilities, limitations, and future impact.
( 11
min )
Martin Luther King Jr. Scholar Brian Nord trains machines to explore the cosmos and fights for equity in research.
( 9
min )
This is a guest post co-written with Moulham Zahabi from Matarat. Probably everyone has checked their baggage when flying, and waited anxiously for their bags to appear at the carousel. Successful and timely delivery of your bags depends on a massive infrastructure called the baggage handling system (BHS). This infrastructure is one of the key […]
( 13
min )
This is a guest post by Carter Huffman, CTO and Co-founder at Modulate. Modulate is a Boston-based startup on a mission to build richer, safer, more inclusive online gaming experiences for everyone. We’re a team of world-class audio experts, gamers, allies, and futurists who are eager to build a better online world and make voice […]
( 7
min )
Globally, many organizations have critical business data dispersed among various content repositories, making it difficult to access this information in a streamlined and cohesive manner. Creating a unified and secure search experience is a significant challenge for organizations because each repository contains a wide range of document formats and access control mechanisms. Amazon Kendra is […]
( 10
min )
This is a guest blog post co-written with Hussain Jagirdar from Games24x7. Games24x7 is one of India’s most valuable multi-game platforms and entertains over 100 million gamers across various skill games. With “Science of Gaming” as their core philosophy, they have enabled a vision of end-to-end informatics around game dynamics, game platforms, and players by […]
( 11
min )
Creating a map requires masterful geographical knowledge, artistic skill and evolving technologies that have taken people from using hand-drawn sketches to satellite imagery. Just as important, changes need to be navigated in the way people consume maps, from paper charts to GPS navigation and interactive online charts. The way people think about video games is Read article >
( 6
min )
Imagine a stroller that can drive itself, help users up hills, brake on slopes and provide alerts of potential hazards. That’s what GlüxKind has done with Ella, an award-winning smart stroller that uses the NVIDIA Jetson edge AI and robotics platform to power its AI features. Kevin Huang and Anne Hunger are the co-founders of Read article >
( 5
min )
Deep classifier neural networks enter the terminal phase of training (TPT)
when training error reaches zero and tend to exhibit intriguing Neural Collapse
(NC) properties. Neural collapse essentially represents a state at which the
within-class variability of final hidden layer outputs is infinitesimally small
and their class means form a simplex equiangular tight frame. This simplifies
the last layer behaviour to that of a nearest-class center decision rule.
Despite the simplicity of this state, the dynamics and implications of reaching
it are yet to be fully understood. In this work, we review the principles which
aid in modelling neural collapse, followed by the implications of this state on
generalization and transfer learning capabilities of neural networks. Finally,
we conclude by discussing potential avenues and directions for future research.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
Object detection is a crucial task in computer vision that aims to identify
and localize objects in images or videos. The recent advancements in deep
learning and Convolutional Neural Networks (CNNs) have significantly improved
the performance of object detection techniques. This paper presents a
comprehensive study of object detection techniques in unconstrained
environments, including various challenges, datasets, and state-of-the-art
approaches. Additionally, we present a comparative analysis of the methods and
highlight their strengths and weaknesses. Finally, we provide some future
research directions to further improve object detection in unconstrained
environments.
( 2
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63\% accuracy on some of the interview videos.
( 2
min )
Recently, large language models (LLMs) like ChatGPT have demonstrated
remarkable performance across a variety of natural language processing tasks.
However, their effectiveness in the financial domain, specifically in
predicting stock market movements, remains to be explored. In this paper, we
conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal
stock movement prediction, on three tweets and historical stock price datasets.
Our findings indicate that ChatGPT is a "Wall Street Neophyte" with limited
success in predicting stock movements, as it underperforms not only
state-of-the-art methods but also traditional methods like linear regression
using price features. Despite the potential of Chain-of-Thought prompting
strategies and the inclusion of tweets, ChatGPT's performance remains subpar.
Furthermore, we observe limitations in its explainability and stability,
suggesting the need for more specialized training or fine-tuning. This research
provides insights into ChatGPT's capabilities and serves as a foundation for
future work aimed at improving financial market analysis and prediction by
leveraging social media sentiment and historical stock data.
( 2
min )
We study a game between autobidding algorithms that compete in an online
advertising platform. Each autobidder is tasked with maximizing its
advertiser's total value over multiple rounds of a repeated auction, subject to
budget and/or return-on-investment constraints. We propose a gradient-based
learning algorithm that is guaranteed to satisfy all constraints and achieves
vanishing individual regret. Our algorithm uses only bandit feedback and can be
used with the first- or second-price auction, as well as with any
"intermediate" auction format. Our main result is that when these autobidders
play against each other, the resulting expected liquid welfare over all rounds
is at least half of the expected optimal liquid welfare achieved by any
allocation. This holds whether or not the bidding dynamics converges to an
equilibrium and regardless of the correlation structure between advertiser
valuations.
( 2
min )
The paper presents a modular approach for the estimation of a leading
vehicle's velocity based on a non-intrusive stereo camera where SiamMask is
used for leading vehicle tracking, Kernel Density estimate (KDE) is used to
smooth the distance prediction from a disparity map, and LightGBM is used for
leading vehicle velocity estimation.
Our approach yields an RMSE of 0.416 which outperforms the baseline RMSE of
0.582 for the SUBARU Image Recognition Challenge
( 2
min )
Despite the vast body of literature on Active Learning (AL), there is no
comprehensive and open benchmark allowing for efficient and simple comparison
of proposed samplers. Additionally, the variability in experimental settings
across the literature makes it difficult to choose a sampling strategy, which
is critical due to the one-off nature of AL experiments. To address those
limitations, we introduce OpenAL, a flexible and open-source framework to
easily run and compare sampling AL strategies on a collection of realistic
tasks. The proposed benchmark is augmented with interpretability metrics and
statistical analysis methods to understand when and why some samplers
outperform others. Last but not least, practitioners can easily extend the
benchmark by submitting their own AL samplers.
( 2
min )
We developed a prototype device for dynamic gaze and accommodation
measurements based on 4 Purkinje reflections (PR) suitable for use in AR and
ophthalmology applications. PR1&2 and PR3&4 are used for accurate gaze and
accommodation measurements, respectively. Our eye model was developed in ZEMAX
and matches the experiments well. Our model predicts the accommodation from 4
diopters to 1 diopter with better than 0.25D accuracy. We performed
repeatability tests and obtained accurate gaze and accommodation estimations
from subjects. We are generating a large synthetic data set using physically
accurate models and machine learning.
( 2
min )
The consumption of microbial-contaminated food and water is responsible for
the deaths of millions of people annually. Smartphone-based microscopy systems
are portable, low-cost, and more accessible alternatives for the detection of
Giardia and Cryptosporidium than traditional brightfield microscopes. However,
the images from smartphone microscopes are noisier and require manual cyst
identification by trained technicians, usually unavailable in resource-limited
settings. Automatic detection of (oo)cysts using deep-learning-based object
detection could offer a solution for this limitation. We evaluate the
performance of three state-of-the-art object detectors to detect (oo)cysts of
Giardia and Cryptosporidium on a custom dataset that includes both smartphone
and brightfield microscopic images from vegetable samples. Faster RCNN,
RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed
to explore their efficacy and limitations. Our results show that while the
deep-learning models perform better with the brightfield microscopy image
dataset than the smartphone microscopy image dataset, the smartphone microscopy
predictions are still comparable to the prediction performance of non-experts.
( 2
min )
Deep learning based approaches like Physics-informed neural networks (PINNs)
and DeepONets have shown promise on solving PDE constrained optimization
(PDECO) problems. However, existing methods are insufficient to handle those
PDE constraints that have a complicated or nonlinear dependency on optimization
targets. In this paper, we present a novel bi-level optimization framework to
resolve the challenge by decoupling the optimization of the targets and
constraints. For the inner loop optimization, we adopt PINNs to solve the PDE
constraints only. For the outer loop, we design a novel method by using
Broyden's method based on the Implicit Function Theorem (IFT), which is
efficient and accurate for approximating hypergradients. We further present
theoretical explanations and error analysis of the hypergradients computation.
Extensive experiments on multiple large-scale and nonlinear PDE constrained
optimization problems demonstrate that our method achieves state-of-the-art
results compared with strong baselines.
( 2
min )
This paper introduces a novel representation of convolutional Neural Networks
(CNNs) in terms of 2-D dynamical systems. To this end, the usual description of
convolutional layers with convolution kernels, i.e., the impulse responses of
linear filters, is realized in state space as a linear time-invariant 2-D
system. The overall convolutional Neural Network composed of convolutional
layers and nonlinear activation functions is then viewed as a 2-D version of a
Lur'e system, i.e., a linear dynamical system interconnected with static
nonlinear components. One benefit of this 2-D Lur'e system perspective on CNNs
is that we can use robust control theory much more efficiently for Lipschitz
constant estimation than previously possible.
( 2
min )
Artificial neural networks are promising for general function approximation
but challenging to train on non-independent or non-identically distributed data
due to catastrophic forgetting. The experience replay buffer, a standard
component in deep reinforcement learning, is often used to reduce forgetting
and improve sample efficiency by storing experiences in a large buffer and
using them for training later. However, a large replay buffer results in a
heavy memory burden, especially for onboard and edge devices with limited
memory capacities. We propose memory-efficient reinforcement learning
algorithms based on the deep Q-network algorithm to alleviate this problem. Our
algorithms reduce forgetting and maintain high sample efficiency by
consolidating knowledge from the target Q-network to the current Q-network.
Compared to baseline methods, our algorithms achieve comparable or better
performance in both feature-based and image-based tasks while easing the burden
of large experience replay buffers.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
In the past few years, more and more AI applications have been applied to
edge devices. However, models trained by data scientists with machine learning
frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on
edge. In this paper, we develop an end-to-end code generator parsing a
pre-trained model to C source libraries for the backend using MicroTVM, a
machine learning compiler framework extension addressing inference on bare
metal devices. An analysis shows that specific compute-intensive operators can
be easily offloaded to the dedicated accelerator with a Universal Modular
Accelerator (UMA) interface, while others are processed in the CPU cores. By
using the automatically generated ahead-of-time C runtime, we conduct a hand
gesture recognition experiment on an ARM Cortex M4F core.
( 2
min )
These lecture notes provide an overview of Neural Network architectures from
a mathematical point of view. Especially, Machine Learning with Neural Networks
is seen as an optimization problem. Covered are an introduction to Neural
Networks and the following architectures: Feedforward Neural Network,
Convolutional Neural Network, ResNet, and Recurrent Neural Network.
( 2
min )
Classic online prediction algorithms, such as Hedge, are inherently unfair by
design, as they try to play the most rewarding arm as many times as possible
while ignoring the sub-optimal arms to achieve sublinear regret. In this paper,
we consider a fair online prediction problem in the adversarial setting with
hard lower bounds on the rate of accrual of rewards for all arms. By combining
elementary queueing theory with online learning, we propose a new online
prediction policy, called BanditQ, that achieves the target rate constraints
while achieving a regret of $O(T^{3/4})$ in the full-information setting. The
design and analysis of BanditQ involve a novel use of the potential function
method and are of independent interest.
( 2
min )
Geometric deep learning enables the encoding of physical symmetries in
modeling 3D objects. Despite rapid progress in encoding 3D symmetries into
Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness
of these networks through a local-to-global analysis lacks today. In this
paper, we propose a local hierarchy of 3D isomorphism to evaluate the
expressive power of equivariant GNNs and investigate the process of
representing global geometric information from local patches. Our work leads to
two crucial modules for designing expressive and efficient geometric GNNs;
namely local substructure encoding (LSE) and frame transition encoding (FTE).
To demonstrate the applicability of our theory, we propose LEFTNet which
effectively implements these modules and achieves state-of-the-art performance
on both scalar-valued and vector-valued molecular property prediction tasks. We
further point out the design space for future developments of equivariant graph
neural networks. Our codes are available at
\url{https://github.com/yuanqidu/LeftNet}.
( 2
min )
Dynamic spectrum access systems typically require information about the
spectrum occupancy and thus the presence of other users in order to make a
spectrum al-location decision for a new device. Simple methods of spectrum
occupancy detection are often far from reliable, hence spectrum occupancy
detection algorithms supported by machine learning or artificial intelligence
are often and successfully used. To protect the privacy of user data and to
reduce the amount of control data, an interesting approach is to use federated
machine learning. This paper compares two approaches to system design using
federated machine learning: with and without a central node.
( 2
min )
Breast cancer is one of the most common and dangerous cancers in women, while
it can also afflict men. Breast cancer treatment and detection are greatly
aided by the use of histopathological images since they contain sufficient
phenotypic data. A Deep Neural Network (DNN) is commonly employed to improve
accuracy and breast cancer detection. In our research, we have analyzed
pre-trained deep transfer learning models such as ResNet50, ResNet101, VGG16,
and VGG19 for detecting breast cancer using the 2453 histopathology images
dataset. Images in the dataset were separated into two categories: those with
invasive ductal carcinoma (IDC) and those without IDC. After analyzing the
transfer learning model, we found that ResNet50 outperformed other models,
achieving accuracy rates of 90.2%, Area under Curve (AUC) rates of 90.0%,
recall rates of 94.7%, and a marginal loss of 3.5%.
( 2
min )
In the automotive industry, the full cycle of managing in-use vehicle quality
issues can take weeks to investigate. The process involves isolating root
causes, defining and implementing appropriate treatments, and refining
treatments if needed. The main pain-point is the lack of a systematic method to
identify causal relationships, evaluate treatment effectiveness, and direct the
next actionable treatment if the current treatment was deemed ineffective. This
paper will show how we leverage causal Machine Learning (ML) to speed up such
processes. A real-word data set collected from on-road vehicles will be used to
demonstrate the proposed framework. Open challenges for vehicle quality
applications will also be discussed.
( 2
min )
We present a deep-learning based approach for measuring small planetary
radial velocities in the presence of stellar variability. We use neural
networks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-star
spectra. We develop and compare dimensionality-reduction and data splitting
methods, as well as various neural network architectures including single line
CNNs, an ensemble of single line CNNs, and a multi-line CNN. We inject
planet-like RVs into the spectra and use the network to recover them. We find
that the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude,
50 day period, with 8.8% error in the amplitude and 0.7% in the period. This
approach shows promise for mitigating stellar RV variability and enabling the
detection of small planetary RVs with unprecedented precision.
( 2
min )
Object pose estimation is a critical task in robotics for precise object
manipulation. However, current techniques heavily rely on a reference 3D
object, limiting their generalizability and making it expensive to expand to
new object categories. Direct pose predictions also provide limited information
for robotic grasping without referencing the 3D model. Keypoint-based methods
offer intrinsic descriptiveness without relying on an exact 3D model, but they
may lack consistency and accuracy. To address these challenges, this paper
proposes ShapeShift, a superquadric-based framework for object pose estimation
that predicts the object's pose relative to a primitive shape which is fitted
to the object. The proposed framework offers intrinsic descriptiveness and the
ability to generalize to arbitrary geometric shapes beyond the training set.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Although neural networks (especially deep neural networks) have achieved
\textit{better-than-human} performance in many fields, their real-world
deployment is still questionable due to the lack of awareness about the
limitation in their knowledge. To incorporate such awareness in the machine
learning model, prediction with reject option (also known as selective
classification or classification with abstention) has been proposed in
literature. In this paper, we present a systematic review of the prediction
with the reject option in the context of various neural networks. To the best
of our knowledge, this is the first study focusing on this aspect of neural
networks. Moreover, we discuss different novel loss functions related to the
reject option and post-training processing (if any) of network output for
generating suitable measurements for knowledge awareness of the model. Finally,
we address the application of the rejection option in reducing the prediction
time for the real-time problems and present a comprehensive summary of the
techniques related to the reject option in the context of extensive variety of
neural networks. Our code is available on GitHub:
\url{https://github.com/MehediHasanTutul/Reject_option}
( 2
min )
Epilepsy is the most common neurological disorder and an accurate forecast of
seizures would help to overcome the patient's uncertainty and helplessness. In
this contribution, we present and discuss a novel methodology for the
classification of intracranial electroencephalography (iEEG) for seizure
prediction. Contrary to previous approaches, we categorically refrain from an
extraction of hand-crafted features and use a convolutional neural network
(CNN) topology instead for both the determination of suitable signal
characteristics and the binary classification of preictal and interictal
segments. Three different models have been evaluated on public datasets with
long-term recordings from four dogs and three patients. Overall, our findings
demonstrate the general applicability. In this work we discuss the strengths
and limitations of our methodology.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
We propose a hierarchical tensor-network approach for approximating
high-dimensional probability density via empirical distribution. This leverages
randomized singular value decomposition (SVD) techniques and involves solving
linear equations for tensor cores in this tensor network. The complexity of the
resulting algorithm scales linearly in the dimension of the high-dimensional
density. An analysis of estimation error demonstrates the effectiveness of this
method through several numerical experiments.
( 2
min )
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization
method for deep networks that has exhibited performance improvements on image
and language prediction problems. We show that when SAM is applied with a
convex quadratic objective, for most random initializations it converges to a
cycle that oscillates between either side of the minimum in the direction with
the largest curvature, and we provide bounds on the rate of convergence.
In the non-quadratic case, we show that such oscillations effectively perform
gradient descent, with a smaller step-size, on the spectral norm of the
Hessian. In such cases, SAM's update may be regarded as a third derivative --
the derivative of the Hessian in the leading eigenvector direction -- that
encourages drift toward wider minima.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
I am doing a thesis on this topic and I am working with this software EVA3D. I have a limited experience working with ML algorithms and I am struggling to make this software work on input that I provide. The output of the thesis is a working software that transforms 2D images to 3D mesh models. I am working with EVA3D as a starting code and I want to work on it's limitations from there, but, as I mentioned, am struggling with working with it. If someone can provide me with a solution how to change the dataset.py file to match manual input that I provide I would be very grateful.
And if anyone has other suggestions for other repos or softwares please link them. Thanks.
submitted by /u/IsDeathTheStart
[link] [comments]
( 44
min )
Financial services, the gig economy, telco, healthcare, social networking, and other customers use face verification during online onboarding, step-up authentication, age-based access restriction, and bot detection. These customers verify user identity by matching the user’s face in a selfie captured by a device camera with a government-issued identity card photo or preestablished profile photo. They […]
( 10
min )
Developing web interfaces to interact with a machine learning (ML) model is a tedious task. With Streamlit, developing demo applications for your ML solution is easy. Streamlit is an open-source Python library that makes it easy to create and share web apps for ML and data science. As a data scientist, you may want to […]
( 7
min )
Enterprise customers have multiple lines of businesses (LOBs) and groups and teams within them. These customers need to balance governance, security, and compliance against the need for machine learning (ML) teams to quickly access their data science environments in a secure manner. These enterprise customers that are starting to adopt AWS, expanding their footprint on […]
( 11
min )
Amazon SageMaker Studio can help you build, train, debug, deploy, and monitor your models and manage your machine learning (ML) workflows. Amazon SageMaker Pipelines enables you to build a secure, scalable, and flexible MLOps platform within Studio. In this post, we explain how to run PySpark processing jobs within a pipeline. This enables anyone that […]
( 9
min )
RStudio on Amazon SageMaker is the first fully managed cloud-based Posit Workbench (formerly known as RStudio Workbench). RStudio on Amazon SageMaker removes the need for you to manage the underlying Posit Workbench infrastructure, so your teams can concentrate on producing value for your business. You can quickly launch the familiar RStudio integrated development environment (IDE) […]
( 10
min )
Announcements Redefining “No-Code” Development Platforms I recently watched a video from Blizzard Entertainment Game Director Wyatt Cheng on ChatGPT’s ability to create a simple video game from scratch. While the art assets were not created by ChatGPT, the AI program Midjourney created the program using rough sketches and text prompts. Cheng created this challenge for… Read More »DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms
The post DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms appeared first on Data Science Central.
( 19
min )
Modern IT companies widely use virtualization due to advantages such as scalability, rational consumption of resources, and convenient backup. This article explains how Policy-Based Data Protection, a feature in NAKIVO Backup & Replication software, works, makes managing VM data protection more accessible, and outlines its benefits. What Is Policy-Based Data Protection? Policy-Based Data Protection is… Read More »VM Data Protection: Automate VM Backup and Replication in a Few Clicks
The post VM Data Protection: Automate VM Backup and Replication in a Few Clicks appeared first on Data Science Central.
( 28
min )
The digital landscape today is rapidly evolving, and businesses now face an unprecedented array of cyber threats putting sensitive data, financial assets, and even their reputation at risk.
The post Machine Learning and AI: The Future of SIEM Alternatives in Cybersecurity appeared first on Data Science Central.
( 21
min )
submitted by /u/Pilot_Maple
[link] [comments]
( 43
min )
In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models using Amazon SageMaker JumpStart. Today, we are excited to introduce a new feature that enables users to inpaint images with Stable Diffusion models. Inpainting refers to the process of replacing a portion of an image with another image […]
( 10
min )
You don’t have to be an expert in machine learning (ML) to appreciate the value of large language models (LLMs). Better search results, image recognition for the visually impaired, creating novel designs from text, and intelligent chatbots are just some examples of how these models are facilitating various applications and tasks. ML practitioners keep improving […]
( 10
min )
In the first blog post in this series, Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems, we presented a brief overview of Microsoft’s research on Cloud Intelligence/AIOps (AIOps), which innovates AI and machine learning (ML) technologies to help design, build, and operate complex cloud platforms and services effectively and efficiently at scale. As cloud […]
The post Building toward more autonomous and proactive cloud technologies with AI appeared first on Microsoft Research.
( 16
min )
Delve into digital healthcare trends and examine how automated data entry is revolutionizing patient data management, decision-making, and care delivery.
The post Digital Healthcare Trends: Emergence of Automated Data Entry in Healthcare appeared first on Data Science Central.
( 20
min )
This paper presents a combination of machine learning techniques to enable
prompt evaluation of retired electric vehicle batteries as to either retain
those batteries for a second-life application and extend their operation beyond
the original and first intent or send them to recycle facilities. The proposed
algorithm generates features from available battery current and voltage
measurements with simple statistics, selects and ranks the features using
correlation analysis, and employs Gaussian Process Regression enhanced with
bagging. This approach is validated over publicly available aging datasets of
more than 200 cells with slow and fast charging, with different cathode
chemistries, and for diverse operating conditions. Promising results are
observed based on multiple training-test partitions, wherein the mean of Root
Mean Squared Percent Error and Mean Percent Error performance errors are found
to be less than 1.48% and 1.29%, respectively, in the worst-case scenarios.
( 2
min )
We explore the metric and preference learning problem in Hilbert spaces. We
obtain a novel representer theorem for the simultaneous task of metric and
preference learning. Our key observation is that the representer theorem can be
formulated with respect to the norm induced by the inner product inherent in
the problem structure. Additionally, we demonstrate how our framework can be
applied to the task of metric learning from triplet comparisons and show that
it leads to a simple and self-contained representer theorem for this task. In
the case of Reproducing Kernel Hilbert Spaces (RKHS), we demonstrate that the
solution to the learning problem can be expressed using kernel terms, akin to
classical representer theorems.
( 2
min )
Current literature demonstrates that Large Language Models (LLMs) are great
few-shot learners, and prompting significantly increases their performance on a
range of downstream tasks in a few-shot learning setting. An attempt to
automate human-led prompting followed, with some progress achieved. In
particular, subsequent work demonstrates automation can outperform fine-tuning
in certain K-shot learning scenarios.
In this paper, we revisit techniques for automated prompting on six different
downstream tasks and a larger range of K-shot learning settings. We find that
automated prompting does not consistently outperform simple manual prompts. Our
work suggests that, in addition to fine-tuning, manual prompts should be used
as a baseline in this line of research.
( 2
min )
Percolation is an important topic in climate, physics, materials science,
epidemiology, finance, and so on. Prediction of percolation thresholds with
machine learning methods remains challenging. In this paper, we build a
powerful graph convolutional neural network to study the percolation in both
supervised and unsupervised ways. From a supervised learning perspective, the
graph convolutional neural network simultaneously and correctly trains data of
different lattice types, such as the square and triangular lattices. For the
unsupervised perspective, combining the graph convolutional neural network and
the confusion method, the percolation threshold can be obtained by the "W"
shaped performance. The finding of this work opens up the possibility of
building a more general framework that can probe the percolation-related
phenomenon.
( 2
min )
Image segmentation is a fundamental task in the field of imaging and vision.
Supervised deep learning for segmentation has achieved unparalleled success
when sufficient training data with annotated labels are available. However,
annotation is known to be expensive to obtain, especially for histopathology
images where the target regions are usually with high morphology variations and
irregular shapes. Thus, weakly supervised learning with sparse annotations of
points is promising to reduce the annotation workload. In this work, we propose
a contrast-based variational model to generate segmentation results, which
serve as reliable complementary supervision to train a deep segmentation model
for histopathology images. The proposed method considers the common
characteristics of target regions in histopathology images and can be trained
in an end-to-end manner. It can generate more regionally consistent and
smoother boundary segmentation, and is more robust to unlabeled `novel'
regions. Experiments on two different histology datasets demonstrate its
effectiveness and efficiency in comparison to previous models.
( 2
min )
Deep learning has been highly successful in some applications. Nevertheless,
its use for solving partial differential equations (PDEs) has only been of
recent interest with current state-of-the-art machine learning libraries, e.g.,
TensorFlow or PyTorch. Physics-informed neural networks (PINNs) are an
attractive tool for solving partial differential equations based on sparse and
noisy data. Here extend PINNs to solve obstacle-related PDEs which present a
great computational challenge because they necessitate numerical methods that
can yield an accurate approximation of the solution that lies above a given
obstacle. The performance of the proposed PINNs is demonstrated in multiple
scenarios for linear and nonlinear PDEs subject to regular and irregular
obstacles.
( 2
min )
In this paper, we present a contraction-guided adaptive partitioning
algorithm for improving interval-valued robust reachable set estimates in a
nonlinear feedback loop with a neural network controller and disturbances.
Based on an estimate of the contraction rate of over-approximated intervals,
the algorithm chooses when and where to partition. Then, by leveraging a
decoupling of the neural network verification step and reachability
partitioning layers, the algorithm can provide accuracy improvements for little
computational cost. This approach is applicable with any sufficiently accurate
open-loop interval-valued reachability estimation technique and any method for
bounding the input-output behavior of a neural network. Using contraction-based
robustness analysis, we provide guarantees of the algorithm's performance with
mixed monotone reachability. Finally, we demonstrate the algorithm's
performance through several numerical simulations and compare it with existing
methods in the literature. In particular, we report a sizable improvement in
the accuracy of reachable set estimation in a fraction of the runtime as
compared to state-of-the-art methods.
( 2
min )
Previous work has established that RNNs with an unbounded activation function
have the capacity to count exactly. However, it has also been shown that RNNs
are challenging to train effectively and generally do not learn exact counting
behaviour. In this paper, we focus on this problem by studying the simplest
possible RNN, a linear single-cell network. We conduct a theoretical analysis
of linear RNNs and identify conditions for the models to exhibit exact counting
behaviour. We provide a formal proof that these conditions are necessary and
sufficient. We also conduct an empirical analysis using tasks involving a
Dyck-1-like Balanced Bracket language under two different settings. We observe
that linear RNNs generally do not meet the necessary and sufficient conditions
for counting behaviour when trained with the standard approach. We investigate
how varying the length of training sequences and utilising different target
classes impacts model behaviour during training and the ability of linear RNN
models to effectively approximate the indicator conditions.
( 2
min )
Estimating the political leanings of social media users is a challenging and
ever more pressing problem given the increase in social media consumption. We
introduce Retweet-BERT, a simple and scalable model to estimate the political
leanings of Twitter users. Retweet-BERT leverages the retweet network structure
and the language used in users' profile descriptions. Our assumptions stem from
patterns of networks and linguistics homophily among people who share similar
ideologies. Retweet-BERT demonstrates competitive performance against other
state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter
datasets (a COVID-19 dataset and a 2020 United States presidential elections
dataset). We also perform manual validation to validate the performance of
Retweet-BERT on users not in the training data. Finally, in a case study of
COVID-19, we illustrate the presence of political echo chambers on Twitter and
show that it exists primarily among right-leaning users. Our code is
open-sourced and our data is publicly available.
( 3
min )
Generating synthetic data through generative models is gaining interest in
the ML community and beyond. In the past, synthetic data was often regarded as
a means to private data release, but a surge of recent papers explore how its
potential reaches much further than this -- from creating more fair data to
data augmentation, and from simulation to text generated by ChatGPT. In this
perspective we explore whether, and how, synthetic data may become a dominant
force in the machine learning world, promising a future where datasets can be
tailored to individual needs. Just as importantly, we discuss which fundamental
challenges the community needs to overcome for wider relevance and application
of synthetic data -- the most important of which is quantifying how much we can
trust any finding or prediction drawn from synthetic data.
( 2
min )
The rapid mutation of the influenza virus threatens public health.
Reassortment among viruses with different hosts can lead to a fatal pandemic.
However, it is difficult to detect the original host of the virus during or
after an outbreak as influenza viruses can circulate between different species.
Therefore, early and rapid detection of the viral host would help reduce the
further spread of the virus. We use various machine learning models with
features derived from the position-specific scoring matrix (PSSM) and features
learned from word embedding and word encoding to infer the origin host of
viruses. The results show that the performance of the PSSM-based model reaches
the MCC around 95%, and the F1 around 96%. The MCC obtained using the model
with word embedding is around 96%, and the F1 is around 97%.
( 3
min )
Hierarchical reinforcement learning is a promising approach that uses
temporal abstraction to solve complex long horizon problems. However,
simultaneously learning a hierarchy of policies is unstable as it is
challenging to train higher-level policy when the lower-level primitive is
non-stationary. In this paper, we propose a novel hierarchical algorithm by
generating a curriculum of achievable subgoals for evolving lower-level
primitives using reinforcement learning and imitation learning. The lower level
primitive periodically performs data relabeling on a handful of expert
demonstrations using our primitive informed parsing approach. We provide
expressions to bound the sub-optimality of our method and develop a practical
algorithm for hierarchical reinforcement learning. Since our approach uses a
handful of expert demonstrations, it is suitable for most robotic control
tasks. Experimental evaluation on complex maze navigation and robotic
manipulation environments show that inducing hierarchical curriculum learning
significantly improves sample efficiency, and results in efficient goal
conditioned policies for solving temporally extended tasks.
( 2
min )
This paper describes our submission to Task 10 at SemEval 2023-Explainable
Detection of Online Sexism (EDOS), divided into three subtasks. The recent rise
in social media platforms has seen an increase in disproportionate levels of
sexism experienced by women on social media platforms. This has made detecting
and explaining online sexist content more important than ever to make social
media safer and more accessible for women. Our approach consists of
experimenting and finetuning BERT-based models and using a Majority Voting
ensemble model that outperforms individual baseline model scores. Our system
achieves a macro F1 score of 0.8392 for Task A, 0.6092 for Task B, and 0.4319
for Task C.
( 2
min )
Multilabel ranking is a central task in machine learning with widespread
applications to web search, news stories, recommender systems, etc. However,
the most fundamental question of learnability in a multilabel ranking setting
remains unanswered. In this paper, we characterize the learnability of
multilabel ranking problems in both the batch and online settings for a large
family of ranking losses. Along the way, we also give the first equivalence
class of ranking losses based on learnability.
( 2
min )
Variational autoencoder (VAE) architectures have the potential to develop
reduced-order models (ROMs) for chaotic fluid flows. We propose a method for
learning compact and near-orthogonal ROMs using a combination of a $\beta$-VAE
and a transformer, tested on numerical data from a two-dimensional viscous flow
in both periodic and chaotic regimes. The $\beta$-VAE is trained to learn a
compact latent representation of the flow velocity, and the transformer is
trained to predict the temporal dynamics in latent space. Using the $\beta$-VAE
to learn disentangled representations in latent-space, we obtain a more
interpretable flow model with features that resemble those observed in the
proper orthogonal decomposition, but with a more efficient representation.
Using Poincar\'e maps, the results show that our method can capture the
underlying dynamics of the flow outperforming other prediction models. The
proposed method has potential applications in other fields such as weather
forecasting, structural dynamics or biomedical engineering.
( 3
min )
We study the influence of different activation functions in the output layer
of deep neural network models for soft and hard label prediction in the
learning with disagreement task. In this task, the goal is to quantify the
amount of disagreement via predicting soft labels. To predict the soft labels,
we use BERT-based preprocessors and encoders and vary the activation function
used in the output layer, while keeping other parameters constant. The soft
labels are then used for the hard label prediction. The activation functions
considered are sigmoid as well as a step-function that is added to the model
post-training and a sinusoidal activation function, which is introduced for the
first time in this paper.
( 2
min )
We study entropy-regularized constrained Markov decision processes (CMDPs)
under the soft-max parameterization, in which an agent aims to maximize the
entropy-regularized value function while satisfying constraints on the expected
total utility. By leveraging the entropy regularization, our theoretical
analysis shows that its Lagrangian dual function is smooth and the Lagrangian
duality gap can be decomposed into the primal optimality gap and the constraint
violation. Furthermore, we propose an accelerated dual-descent method for
entropy-regularized CMDPs. We prove that our method achieves the global
convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and
the constraint violation for entropy-regularized CMDPs. A discussion about a
linear convergence rate for CMDPs with a single constraint is also provided.
( 2
min )
Existing contrastive learning methods for anomalous sound detection refine
the audio representation of each audio sample by using the contrast between the
samples' augmentations (e.g., with time or frequency masking). However, they
might be biased by the augmented data, due to the lack of physical properties
of machine sound, thereby limiting the detection performance. This paper uses
contrastive learning to refine audio representations for each machine ID,
rather than for each audio sample. The proposed two-stage method uses
contrastive learning to pretrain the audio representation model by
incorporating machine ID and a self-supervised ID classifier to fine-tune the
learnt model, while enhancing the relation between audio features from the same
ID. Experiments show that our method outperforms the state-of-the-art methods
using contrastive learning or self-supervised classification in overall anomaly
detection performance and stability on DCASE 2020 Challenge Task2 dataset.
( 2
min )
In graph neural networks (GNNs), both node features and labels are examples
of graph signals, a key notion in graph signal processing (GSP). While it is
common in GSP to impose signal smoothness constraints in learning and
estimation tasks, it is unclear how this can be done for discrete node labels.
We bridge this gap by introducing the concept of distributional graph signals.
In our framework, we work with the distributions of node labels instead of
their values and propose notions of smoothness and non-uniformity of such
distributional graph signals. We then propose a general regularization method
for GNNs that allows us to encode distributional smoothness and non-uniformity
of the model output in semi-supervised node classification tasks. Numerical
experiments demonstrate that our method can significantly improve the
performance of most base GNN models in different problem settings.
( 2
min )
Keyword spotting systems continuously process audio streams to detect
keywords. One of the most challenging tasks in designing such systems is to
reduce False Alarm (FA) which happens when the system falsely registers a
keyword despite the keyword not being uttered. In this paper, we propose a
simple yet elegant solution to this problem that follows from the law of total
probability. We show that existing deep keyword spotting mechanisms can be
improved by Successive Refinement, where the system first classifies whether
the input audio is speech or not, followed by whether the input is keyword-like
or not, and finally classifies which keyword was uttered. We show across
multiple models with size ranging from 13K parameters to 2.41M parameters, the
successive refinement technique reduces FA by up to a factor of 8 on in-domain
held-out FA data, and up to a factor of 7 on out-of-domain (OOD) FA data.
Further, our proposed approach is "plug-and-play" and can be applied to any
deep keyword spotting model.
( 2
min )
Prediction of chemical shift in NMR using machine learning methods is
typically done with the maximum amount of data available to achieve the best
results. In some cases, such large amounts of data are not available, e.g. for
heteronuclei. We demonstrate a novel machine learning model which is able to
achieve good results with comparatively low amounts of data. We show this by
predicting 19F and 13C NMR chemical shifts of small molecules in specific
solvents.
( 2
min )
We propose a framework for the design of feedback controllers that combines
the optimization-driven and model-free advantages of deep reinforcement
learning with the stability guarantees provided by using the Youla-Kucera
parameterization to define the search domain. Recent advances in behavioral
systems allow us to construct a data-driven internal model; this enables an
alternative realization of the Youla-Kucera parameterization based entirely on
input-output exploration data. Using a neural network to express a
parameterized set of nonlinear stable operators enables seamless integration
with standard deep learning libraries. We demonstrate the approach on a
realistic simulation of a two-tank system.
( 2
min )
In recent years, reinforcement learning (RL) has emerged as a popular
approach for solving sequence-based tasks in machine learning. However, finding
suitable alternatives to RL remains an exciting and innovative research area.
One such alternative that has garnered attention is the Non-Axiomatic Reasoning
System (NARS), which is a general-purpose cognitive reasoning framework. In
this paper, we delve into the potential of NARS as a substitute for RL in
solving sequence-based tasks. To investigate this, we conduct a comparative
analysis of the performance of ONA as an implementation of NARS and
$Q$-Learning in various environments that were created using the Open AI gym.
The environments have different difficulty levels, ranging from simple to
complex. Our results demonstrate that NARS is a promising alternative to RL,
with competitive performance in diverse environments, particularly in
non-deterministic ones.
( 2
min )
Anomalies are often indicators of malfunction or inefficiency in various
systems such as manufacturing, healthcare, finance, surveillance, to name a
few. While the literature is abundant in effective detection algorithms due to
this practical relevance, autonomous anomaly detection is rarely used in
real-world scenarios. Especially in high-stakes applications, a
human-in-the-loop is often involved in processes beyond detection such as
verification and troubleshooting. In this work, we introduce ALARM (for
Analyst-in-the-Loop Anomaly Reasoning and Management); an end-to-end framework
that supports the anomaly mining cycle comprehensively, from detection to
action. Besides unsupervised detection of emerging anomalies, it offers anomaly
explanations and an interactive GUI for human-in-the-loop processes -- visual
exploration, sense-making, and ultimately action-taking via designing new
detection rules -- that help close ``the loop'' as the new rules complement
rule-based supervised detection, typical of many deployed systems in practice.
We demonstrate \method's efficacy through a series of case studies with fraud
analysts from the financial industry.
( 2
min )
We consider adaptive decision-making problems where an agent optimizes a
cumulative performance objective by repeatedly choosing among a finite set of
options. Compared to the classical prediction-with-expert-advice set-up, we
consider situations where losses are constrained and derive algorithms that
exploit the additional structure in optimal and computationally efficient ways.
Our algorithm and our analysis is instance dependent, that is, suboptimal
choices of the environment are exploited and reflected in our regret bounds.
The constraints handle general dependencies between losses (even across time),
and are flexible enough to also account for a loss budget, which the
environment is not allowed to exceed. The performance of the resulting
algorithms is highlighted in two numerical examples, which include a nonlinear
and online system identification task.
( 2
min )
We are developing a virtual coaching system that helps patients adhere to
behavior change interventions (BCI). Our proposed system predicts whether a
patient will perform the targeted behavior and uses counterfactual examples
with feature control to guide personalizsation of BCI. We evaluated our
prediction model using simulated patient data with varying levels of
receptivity to intervention.
( 2
min )
The past decade has witnessed rapid progress in AI research since the
breakthrough in deep learning. AI technology has been applied in almost every
field; therefore, technical and non-technical end-users must understand these
technologies to exploit them. However existing materials are designed for
experts, but non-technical users need appealing materials that deliver complex
ideas in easy-to-follow steps. One notable tool that fits such a profile is
scrollytelling, an approach to storytelling that provides readers with a
natural and rich experience at the reader's pace, along with in-depth
interactive explanations of complex concepts. Hence, this work proposes a novel
visualization design for creating a scrollytelling that can effectively explain
an AI concept to non-technical users. As a demonstration of our design, we
created a scrollytelling to explain the Siamese Neural Network for the visual
similarity matching problem. Our approach helps create a visualization valuable
for a short-timeline situation like a sales pitch. The results show that the
visualization based on our novel design helps improve non-technical users'
perception and machine learning concept knowledge acquisition compared to
traditional materials like online articles.
( 2
min )
We explore the metric and preference learning problem in Hilbert spaces. We
obtain a novel representer theorem for the simultaneous task of metric and
preference learning. Our key observation is that the representer theorem can be
formulated with respect to the norm induced by the inner product inherent in
the problem structure. Additionally, we demonstrate how our framework can be
applied to the task of metric learning from triplet comparisons and show that
it leads to a simple and self-contained representer theorem for this task. In
the case of Reproducing Kernel Hilbert Spaces (RKHS), we demonstrate that the
solution to the learning problem can be expressed using kernel terms, akin to
classical representer theorems.
( 2
min )
Tomographic reconstruction, despite its revolutionary impact on a wide range
of applications, suffers from its ill-posed nature in that there is no unique
solution because of limited and noisy measurements. Therefore, in the absence
of ground truth, quantifying the solution quality is highly desirable but
under-explored. In this work, we address this challenge through Gaussian
process modeling to flexibly and explicitly incorporate prior knowledge of
sample features and experimental noises through the choices of the kernels and
noise models. Our proposed method yields not only comparable reconstruction to
existing practical reconstruction methods (e.g., regularized iterative solver
for inverse problem) but also an efficient way of quantifying solution
uncertainties. We demonstrate the capabilities of the proposed approach on
various images and show its unique capability of uncertainty quantification in
the presence of various noises.
( 2
min )
Multilabel ranking is a central task in machine learning with widespread
applications to web search, news stories, recommender systems, etc. However,
the most fundamental question of learnability in a multilabel ranking setting
remains unanswered. In this paper, we characterize the learnability of
multilabel ranking problems in both the batch and online settings for a large
family of ranking losses. Along the way, we also give the first equivalence
class of ranking losses based on learnability.
( 2
min )
submitted by /u/KozmauXinemo
[link] [comments]
( 42
min )
submitted by /u/walt74
[link] [comments]
( 42
min )
I was reading this article saying that machine learning models are getting too much popularity. They can't fully comprehend. We should focus on other types of artificial intelligence, is what I understood from this article. The false promise of ChatGPT | The Straits Times
4 types of artificial intelligences are reactive machines, limited memory, theory of mind and self-aware according to this link. 4 Types of Artificial Intelligence – BMC Software | Blogs . From what I understood, machine learning would be classified under limited memory.
However, how would you train a theory of mind Ai model? Wouldn't it involve machine learning too?
submitted by /u/Kuhle_Brise
[link] [comments]
( 53
min )
submitted by /u/Neurosymbolic
[link] [comments]
( 41
min )
submitted by /u/IrritablyGrim
[link] [comments]
( 43
min )
submitted by /u/albeXL
[link] [comments]
( 43
min )
submitted by /u/XiaolongWang
[link] [comments]
( 43
min )
submitted by /u/JustSayin_thatuknow
[link] [comments]
( 49
min )
Automatic Labeled Image!
Firstly, we would like to express our utmost gratitude to the creators of Segment-Anything for open-sourcing an exceptional zero-shot segmentation model, here's the github link for segment-anything: https://github.com/facebookresearch/segment-anything
Next, we are thrilled to introduce our extended project based on Segment-Anything. We named it Grounded-Segment-Anything, here's our github repo:
https://github.com/IDEA-Research/Grounded-Segment-Anything
In Grounded-Segment-Anything, we combine Segment-Anything with three strong zero-shot models which build a pipeline for an automatic annotation system and show really really impressive results ! ! !
We combine the following models:
- BLIP: The Powerful Image Captioning Model
- Grounding DINO: The S…
( 47
min )
Article: https://github.com/noisrucer/deep-learning-papers/blob/master/Swin-Transformer/swin_transformer.ipynb
I wrote a complete guide of Swin Transformer and a detailed implementation guide of Swin Transformer with PyTorch.
Hope it helps someone!
submitted by /u/JasonTheCoders
[link] [comments]
( 43
min )
D-Adaption - https://github.com/facebookresearch/dadaptation
Has anyone had success using this for RL? Seems like it could be useful if it works but would like to know any feedback from people who may have tried it already.
submitted by /u/jarym
[link] [comments]
( 42
min )
submitted by /u/instakill200
[link] [comments]
( 42
min )
Heya there. A month or so ago I've read for the first time about the Gaze Redirecting AI technology provided by Nvidia, I think it's called Maxine ( or Maxine has to be the program through which you can achieve this ). However I have an AMD card so I coudn't run it.
I have found a github page of a young coder who, apparently, was able to achieve such a thing before Nvidia came out with its software.
However I haven't been able to install it yet because I'm not a coder and the instructions do not come crystal clear to me. It seems made for people who already know about these sort of programs.
Here is the page: https://github.com/chihfanhsu/gaze_correction
Please let me know if you manage to install it and how you did that. You might DM me aswell if you want!
submitted by /u/heldex
[link] [comments]
( 43
min )
I was wondering how people make these videos, I wanted to make one myself because it would be really funny, but im not sure exactly how it works, does anyone know?
link for example: https://www.youtube.com/watch?v=li_OKCpPxM4
submitted by /u/Void_44
[link] [comments]
( 43
min )
submitted by /u/begmax
[link] [comments]
( 42
min )
submitted by /u/al-Assas
[link] [comments]
( 42
min )
All are welcome :) just a bit of fun...
https://chat.whatsapp.com/BVqzerznn226l41xxi0oNC
submitted by /u/140BPMMaster
[link] [comments]
( 43
min )
submitted by /u/kevmo314
[link] [comments]
( 44
min )
We release Datasynth, a pipeline for synthetic data generation and normalization operations using LangChain and LLM APIs. Using Datasynth, you can generate absolutely synthetic datasets to train a task-specific model you can run on your own GPU.
For testing, we generated synthetic datasets for names, prices, and addresses then trained a Seq2Seq model for evaluation. Initial models for standardization are available on HuggingFace
Public code is available on GitHub
submitted by /u/tobiadefami
[link] [comments]
( 44
min )
submitted by /u/NoteDancing
[link] [comments]
( 43
min )
Looking through ICLR and CVPR papers, I came across a couple of papers that broke the dual submission policy and eventually got accepted in CVPR. With all the quiet talk about collusion rings and rigged reviews, does nobody care about the dual submission policy anymore?
Here is an example paper: [1] submitted to ICLR on Sep 22, withdrawn from ICLR on Nov 16 [2], but it was already submitted to CVPR on Nov 4 [3].
[1] Learning Rotation-Equivariant Features for Visual Correspondence - https://arxiv.org/abs/2303.15472
[2] https://openreview.net/forum?id=GCF6ZOA6Npk
[3] https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
submitted by /u/redlow0992
[link] [comments]
( 45
min )
This paper proposes an extension of principal component analysis for Gaussian
process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a
low-dimensional space of GP posteriors, it can be used for meta-learning, which
is a framework for improving the performance of target tasks by estimating a
structure of a set of tasks. The issue is how to define a structure of a set of
GPs with an infinite-dimensional parameter, such as coordinate system and a
divergence. In this study, we reduce the infiniteness of GP to the
finite-dimensional case under the information geometrical framework by
considering a space of GP posteriors that have the same prior. In addition, we
propose an approximation method of GP-PCA based on variational inference and
demonstrate the effectiveness of GP-PCA as meta-learning through experiments.
( 2
min )
We consider the sequential anomaly detection problem in the one-class setting
when only the anomalous sequences are available and propose an adversarial
sequential detector by solving a minimax problem to find an optimal detector
against the worst-case sequences from a generator. The generator captures the
dependence in sequential events using the marked point process model. The
detector sequentially evaluates the likelihood of a test sequence and compares
it with a time-varying threshold, also learned from data through the minimax
problem. We demonstrate our proposed method's good performance using numerical
experiments on simulations and proprietary large-scale credit card fraud
datasets. The proposed method can generally apply to detecting anomalous
sequences.
( 2
min )
In this work, we derive sharp non-asymptotic deviation bounds for weighted
sums of Dirichlet random variables. These bounds are based on a novel integral
representation of the density of a weighted Dirichlet sum. This representation
allows us to obtain a Gaussian-like approximation for the sum distribution
using geometry and complex analysis methods. Our results generalize similar
bounds for the Beta distribution obtained in the seminal paper Alfers and
Dinges [1984]. Additionally, our results can be considered a sharp
non-asymptotic version of the inverse of Sanov's theorem studied by Ganesh and
O'Connell [1999] in the Bayesian setting. Based on these results, we derive new
deviation bounds for the Dirichlet process posterior means with application to
Bayesian bootstrap. Finally, we apply our estimates to the analysis of the
Multinomial Thompson Sampling (TS) algorithm in multi-armed bandits and
significantly sharpen the existing regret bounds by making them independent of
the size of the arms distribution support.
( 2
min )
The introduction of embedding techniques has pushed forward significantly the
Natural Language Processing field. Many of the proposed solutions have been
presented for word-level encoding; anyhow, in the last years, new mechanism to
treat information at an higher level of aggregation, like at sentence- and
document-level, have emerged. With this work we address specifically the
sentence embeddings problem, presenting the Static Fuzzy Bag-of-Word model. Our
model is a refinement of the Fuzzy Bag-of-Words approach, providing sentence
embeddings with a predefined dimension. SFBoW provides competitive performances
in Semantic Textual Similarity benchmarks, while requiring low computational
resources.
( 2
min )
Scenario-based probabilistic forecasts have become vital for decision-makers
in handling intermittent renewable energies. This paper presents a recent
promising deep learning generative approach called denoising diffusion
probabilistic models. It is a class of latent variable models which have
recently demonstrated impressive results in the computer vision community.
However, to our knowledge, there has yet to be a demonstration that they can
generate high-quality samples of load, PV, or wind power time series, crucial
elements to face the new challenges in power systems applications. Thus, we
propose the first implementation of this model for energy forecasting using the
open data of the Global Energy Forecasting Competition 2014. The results
demonstrate this approach is competitive with other state-of-the-art deep
learning generative models, including generative adversarial networks,
variational autoencoders, and normalizing flows.
( 2
min )
Due to the complex behavior arising from non-uniqueness, symmetry, and
bifurcations in the solution space, solving inverse problems of nonlinear
differential equations (DEs) with multiple solutions is a challenging task. To
address this issue, we propose homotopy physics-informed neural networks
(HomPINNs), a novel framework that leverages homotopy continuation and neural
networks (NNs) to solve inverse problems. The proposed framework begins with
the use of a NN to simultaneously approximate known observations and conform to
the constraints of DEs. By utilizing the homotopy continuation method, the
approximation traces the observations to identify multiple solutions and solve
the inverse problem. The experiments involve testing the performance of the
proposed method on one-dimensional DEs and applying it to solve a
two-dimensional Gray-Scott simulation. Our findings demonstrate that the
proposed method is scalable and adaptable, providing an effective solution for
solving DEs with multiple solutions and unknown parameters. Moreover, it has
significant potential for various applications in scientific computing, such as
modeling complex systems and solving inverse problems in physics, chemistry,
biology, etc.
( 3
min )
The accuracy of predictive models for solitary pulmonary nodule (SPN)
diagnosis can be greatly increased by incorporating repeat imaging and medical
context, such as electronic health records (EHRs). However, clinically routine
modalities such as imaging and diagnostic codes can be asynchronous and
irregularly sampled over different time scales which are obstacles to
longitudinal multimodal learning. In this work, we propose a transformer-based
multimodal strategy to integrate repeat imaging with longitudinal clinical
signatures from routinely collected EHRs for SPN classification. We perform
unsupervised disentanglement of latent clinical signatures and leverage
time-distance scaled self-attention to jointly learn from clinical signatures
expressions and chest computed tomography (CT) scans. Our classifier is
pretrained on 2,668 scans from a public dataset and 1,149 subjects with
longitudinal chest CTs, billing codes, medications, and laboratory tests from
EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs
revealed a significant AUC improvement over a longitudinal multimodal baseline
(0.824 vs 0.752 AUC), as well as improvements over a single cross-section
multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741
AUC). This work demonstrates significant advantages with a novel approach for
co-learning longitudinal imaging and non-imaging phenotypes with transformers.
( 3
min )
Deep networks have achieved impressive results on a range of well-curated
benchmark datasets. Surprisingly, their performance remains sensitive to
perturbations that have little effect on human performance. In this work, we
propose a novel extension of Mixup called Robustmix that regularizes networks
to classify based on lower-frequency spatial features. We show that this type
of regularization improves robustness on a range of benchmarks such as
Imagenet-C and Stylized Imagenet. It adds little computational overhead and,
furthermore, does not require a priori knowledge of a large set of image
transformations. We find that this approach further complements recent advances
in model architecture and data augmentation, attaining a state-of-the-art mCE
of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of
16 mCE compared to the baseline.
( 2
min )
Self-supervised pretraining has been observed to improve performance in
supervised learning tasks in medical imaging. This study investigates the
utility of self-supervised pretraining prior to conducting supervised
fine-tuning for the downstream task of lung sliding classification in M-mode
lung ultrasound images. We propose a novel pairwise relationship that couples
M-mode images constructed from the same B-mode image and investigate the
utility of data augmentation procedure specific to M-mode lung ultrasound. The
results indicate that self-supervised pretraining yields better performance
than full supervision, most notably for feature extractors not initialized with
ImageNet-pretrained weights. Moreover, we observe that including a vast volume
of unlabelled data results in improved performance on external validation
datasets, underscoring the value of self-supervision for improving
generalizability in automatic ultrasound interpretation. To the authors' best
knowledge, this study is the first to characterize the influence of
self-supervised pretraining for M-mode ultrasound.
( 2
min )
The combined growth of available data and their unstructured nature has
received increased interest in natural language processing (NLP) techniques to
make value of these data assets since this format is not suitable for
statistical analysis. This work presents a systematic literature review of
state-of-the-art advances using transformer-based methods on electronic medical
records (EMRs) in different NLP tasks. To the best of our knowledge, this work
is unique in providing a comprehensive review of research on transformer-based
methods for NLP applied to the EMR field. In the initial query, 99 articles
were selected from three public databases and filtered into 65 articles for
detailed analysis. The papers were analyzed with respect to the business
problem, NLP task, models and techniques, availability of datasets,
reproducibility of modeling, language, and exchange format. The paper presents
some limitations of current research and some recommendations for further
research.
( 2
min )
Continuous Integration (CI) has become a well-established software
development practice for automatically and continuously integrating code
changes during software development. An increasing number of Machine Learning
(ML) based approaches for automation of CI phases are being reported in the
literature. It is timely and relevant to provide a Systemization of Knowledge
(SoK) of ML-based approaches for CI phases. This paper reports an SoK of
different aspects of the use of ML for CI. Our systematic analysis also
highlights the deficiencies of the existing ML-based solutions that can be
improved for advancing the state-of-the-art.
( 2
min )
We show that hybrid zonotopes offer an equivalent representation of
feed-forward fully connected neural networks with ReLU activation functions.
Our approach demonstrates that the complexity of binary variables is equal to
the total number of neurons in the network and hence grows linearly in the size
of the network. We demonstrate the utility of the hybrid zonotope formulation
through three case studies including nonlinear function approximation, MPC
closed-loop reachability and verification, and robustness of classification on
the MNIST dataset.
( 2
min )
Advances in deep learning models have revolutionized the study of biomolecule
systems and their mechanisms. Graph representation learning, in particular, is
important for accurately capturing the geometric information of biomolecules at
different levels. This paper presents a comprehensive review of the
methodologies used to represent biological molecules and systems as
computer-recognizable objects, such as sequences, graphs, and surfaces.
Moreover, it examines how geometric deep learning models, with an emphasis on
graph-based techniques, can analyze biomolecule data to enable drug discovery,
protein characterization, and biological system analysis. The study concludes
with an overview of the current state of the field, highlighting the challenges
that exist and the potential future research directions.
( 2
min )
A natural way of estimating heteroscedastic label noise in regression is to
model the observed (potentially noisy) target as a sample from a normal
distribution, whose parameters can be learned by minimizing the negative
log-likelihood. This loss has desirable loss attenuation properties, as it can
reduce the contribution of high-error examples. Intuitively, this behavior can
improve robustness against label noise by reducing overfitting. We propose an
extension of this simple and probabilistic approach to classification that has
the same desirable loss attenuation properties. We evaluate the effectiveness
of the method by measuring its robustness against label noise in
classification. We perform enlightening experiments exploring the inner
workings of the method, including sensitivity to hyperparameters, ablation
studies, and more.
( 2
min )
Class imbalance (CI) in classification problems arises when the number of
observations belonging to one class is lower than the other classes. Ensemble
learning that combines multiple models to obtain a robust model has been
prominently used with data augmentation methods to address class imbalance
problems. In the last decade, a number of strategies have been added to enhance
ensemble learning and data augmentation methods, along with new methods such as
generative adversarial networks (GANs). A combination of these has been applied
in many studies, but the true rank of different combinations would require a
computational review. In this paper, we present a computational review to
evaluate data augmentation and ensemble learning methods used to address
prominent benchmark CI problems. We propose a general framework that evaluates
10 data augmentation and 10 ensemble learning methods for CI problems. Our
objective was to identify the most effective combination for improving
classification performance on imbalanced datasets. The results indicate that
combinations of data augmentation methods with ensemble learning can
significantly improve classification performance on imbalanced datasets. These
findings have important implications for the development of more effective
approaches for handling imbalanced datasets in machine learning applications.
( 3
min )
This paper presents the Real-time Adaptive and Interpretable Detection (RAID)
algorithm. The novel approach addresses the limitations of state-of-the-art
anomaly detection methods for multivariate dynamic processes, which are
restricted to detecting anomalies within the scope of the model training
conditions. The RAID algorithm adapts to non-stationary effects such as data
drift and change points that may not be accounted for during model development,
resulting in prolonged service life. A dynamic model based on joint probability
distribution handles anomalous behavior detection in a system and the root
cause isolation based on adaptive process limits. RAID algorithm does not
require changes to existing process automation infrastructures, making it
highly deployable across different domains. Two case studies involving real
dynamic system data demonstrate the benefits of the RAID algorithm, including
change point adaptation, root cause isolation, and improved detection accuracy.
( 2
min )
submitted by /u/aidudezzz
[link] [comments]
( 42
min )
submitted by /u/popnuts
[link] [comments]
( 43
min )
submitted by /u/thisisinsider
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
https://youtu.be/24yjRbBah3w "Why AI art struggles with hands" by Vox
I was watching this video, and I came to the thought, "if this is how AI sees the world, I wonder if this is how it'd be like trying to describe the 3D to someone who can only experience the 2D and 1D?"
AI reads its own code and we can imput pictures and videos to give it information to read, but how do we know what it's seeing or how it's seeing? What is their experience like compared to ours?
Take this post with a grain of salt, I just wanted to put this thought out there for discussion and see what other people would say.
submitted by /u/Ambitious-Prune-9461
[link] [comments]
( 43
min )
submitted by /u/fchung
[link] [comments]
( 44
min )
submitted by /u/electricaldummy17
[link] [comments]
( 42
min )
submitted by /u/techmanj
[link] [comments]
( 44
min )
Great
submitted by /u/YardAccomplished5952
[link] [comments]
( 42
min )
submitted by /u/AlexiaJM
[link] [comments]
( 43
min )
With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers […]
( 11
min )
GLIP (https://github.com/microsoft/GLIP), which i feel has flown under the radar, is capable of zero-shot object detection. i threw together a notebook that pairs it with the recently released Segment Anything Model (https://github.com/facebookresearch/segment-anything) to do zero-shot instance segmentation: https://colab.research.google.com/drive/1kfdizAJiD5_t-M6yFBB6t2vzGrYg8SJc
submitted by /u/esmooth
[link] [comments]
( 43
min )
For years, while still useful, YouTube transcripts have been pretty terrible. No punctuation, poor translations of heavy accents, and generally difficult to comprehend.
Lo and behold today, I watch a video today from community favourite Károly Zsolnai-Fehér. You know, the Two Minute Papers guy, and..... hold onto your papers, the transcript is almost flawless. Fully punctuated and pretty flawless, even with his heavy English accent.
But I can't see any press about this? When did they transition to a new speech to text model? What model is it using? Anyone have any insight? Here is the video in question if anyone else is interested. https://www.youtube.com/watch?v=1KQc6zHOmtU
submitted by /u/Wooraah
[link] [comments]
( 45
min )
A small project I did a while ago.
Based on a prompt, I ask gpt4 to imagine the project name, architecture and the tools it will use.
I then ask it to implement each file in the project.
Most of the time the project wont run but it's a nice starting point.
Here is the github page: https://github.com/MrNothing/AI-Genie
Note: if your asked for a complex project, if can take a lot of API queries, you have been warned!
Thank you!
submitted by /u/smilefr
[link] [comments]
( 43
min )
submitted by /u/gwern
[link] [comments]
( 42
min )
submitted by /u/UnicornyOnTheCob
[link] [comments]
( 42
min )
submitted by /u/DavstrOne
[link] [comments]
( 44
min )
submitted by /u/TheFootCrew_TFC
[link] [comments]
( 42
min )
submitted by /u/rowancheung
[link] [comments]
( 44
min )
Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. With Amazon SageMaker Canvas, you can now import data from over 40 data sources to be used […]
( 8
min )
This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL) Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, […]
( 14
min )
This GFN Thursday explores the many ways GeForce NOW members can play their favorite PC games across the devices they know and love. Plus, seven new games join the GeForce NOW library this week. More Ways to Play GeForce NOW is the ultimate platform for gamers who want to play across more devices than their Read article >
( 5
min )
Heatmaps are widely used to interpret deep neural networks, particularly for
computer vision tasks, and the heatmap-based explainable AI (XAI) techniques
are a well-researched topic. However, most studies concentrate on enhancing the
quality of the generated heatmap or discovering alternate heatmap generation
techniques, and little effort has been devoted to making heatmap-based XAI
automatic, interactive, scalable, and accessible. To address this gap, we
propose a framework that includes two modules: (1) context modelling and (2)
reasoning. We proposed a template-based image captioning approach for context
modelling to create text-based contextual information from the heatmap and
input data. The reasoning module leverages a large language model to provide
explanations in combination with specialised knowledge. Our qualitative
experiments demonstrate the effectiveness of our framework and heatmap
captioning approach. The code for the proposed template-based heatmap
captioning approach will be publicly available.
( 2
min )
Stock market forecasting has been a challenging part for many analysts and
researchers. Trend analysis, statistical techniques, and movement indicators
have traditionally been used to predict stock price movements, but text
extraction has emerged as a promising method in recent years. The use of neural
networks, especially recurrent neural networks, is abundant in the literature.
In most studies, the impact of different users was considered equal or ignored,
whereas users can have other effects. In the current study, we will introduce
TM-vector and then use this vector to train an IndRNN and ultimately model the
market users' behaviour. In the proposed model, TM-vector is simultaneously
trained with both the extracted Twitter features and market information.
Various factors have been used for the effectiveness of the proposed
forecasting approach, including the characteristics of each individual user,
their impact on each other, and their impact on the market, to predict market
direction more accurately. Dow Jones 30 index has been used in current work.
The accuracy obtained for predicting daily stock changes of Apple is based on
various models, closed to over 95\% and for the other stocks is significant.
Our results indicate the effectiveness of TM-vector in predicting stock market
direction.
( 3
min )
Recently, fully-transformer architectures have replaced the defacto
convolutional architecture for the 3D human pose estimation task. In this paper
we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that
leverages a new \textbf{\textit{dynamic multi-headed convolutional
self-attention}} mechanism for monocular 3D human pose estimation. We designed
a spatial and temporal convolutional transformer to comprehensively model human
joint relations within individual frames and globally across the motion
sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal
joints profile}} for our temporal ConvFormer that fuses complete temporal
information immediately for a local neighborhood of joint features. We have
quantitatively and qualitatively validated our method on three common benchmark
datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have
been conducted to identify the optimal hyper-parameter set. These experiments
demonstrated that we achieved a \textbf{significant parameter reduction
relative to prior transformer models} while attaining State-of-the-Art (SOTA)
or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol
III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on
all three metrics for the MPI-INF-3DHP dataset and for all three subjects on
HumanEva under Protocol II.
( 2
min )
Machine learning (ML) has become critical for post-acquisition data analysis
in (scanning) transmission electron microscopy, (S)TEM, imaging and
spectroscopy. An emerging trend is the transition to real-time analysis and
closed-loop microscope operation. The effective use of ML in electron
microscopy now requires the development of strategies for microscopy-centered
experiment workflow design and optimization. Here, we discuss the associated
challenges with the transition to active ML, including sequential data analysis
and out-of-distribution drift effects, the requirements for the edge operation,
local and cloud data storage, and theory in the loop operations. Specifically,
we discuss the relative contributions of human scientists and ML agents in the
ideation, orchestration, and execution of experimental workflows and the need
to develop universal hyper languages that can apply across multiple platforms.
These considerations will collectively inform the operationalization of ML in
next-generation experimentation.
( 2
min )
Recently developed text-to-image diffusion models make it easy to edit or
create high-quality images. Their ease of use has raised concerns about the
potential for malicious editing or deepfake creation. Imperceptible
perturbations have been proposed as a means of protecting images from malicious
editing by preventing diffusion models from generating realistic images.
However, we find that the aforementioned perturbations are not robust to JPEG
compression, which poses a major weakness because of the common usage and
availability of JPEG. We discuss the importance of robustness for additive
imperceptible perturbations and encourage alternative approaches to protect
images against editing.
( 2
min )
We use information-theoretic tools to derive a novel analysis of Multi-source
Domain Adaptation (MDA) from the representation learning perspective.
Concretely, we study joint distribution alignment for supervised MDA with few
target labels and unsupervised MDA with pseudo labels, where the latter is
relatively hard and less commonly studied. We further provide
algorithm-dependent generalization bounds for these two settings, where the
generalization is characterized by the mutual information between the
parameters and the data. Then we propose a novel deep MDA algorithm, implicitly
addressing the target shift through joint alignment. Finally, the mutual
information bounds are extended to this algorithm providing a non-vacuous
gradient-norm estimation. The proposed algorithm has comparable performance to
the state-of-the-art on target-shifted MDA benchmark with improved memory
efficiency.
( 2
min )
We perform an effective-theory analysis of forward-backward signal
propagation in wide and deep Transformers, i.e., residual neural networks with
multi-head self-attention blocks and multilayer perceptron blocks. This
analysis suggests particular width scalings of initialization and training
hyperparameters for these models. We then take up such suggestions, training
Vision and Language Transformers in practical setups.
( 2
min )
We consider the problem of learning multioutput function classes in batch and
online settings. In both settings, we show that a multioutput function class is
learnable if and only if each single-output restriction of the function class
is learnable. This provides a complete characterization of the learnability of
multilabel classification and multioutput regression in both batch and online
settings. As an extension, we also consider multilabel learnability in the
bandit feedback setting and show a similar characterization as in the
full-feedback setting.
( 2
min )
We use information-theoretic tools to derive a novel analysis of Multi-source
Domain Adaptation (MDA) from the representation learning perspective.
Concretely, we study joint distribution alignment for supervised MDA with few
target labels and unsupervised MDA with pseudo labels, where the latter is
relatively hard and less commonly studied. We further provide
algorithm-dependent generalization bounds for these two settings, where the
generalization is characterized by the mutual information between the
parameters and the data. Then we propose a novel deep MDA algorithm, implicitly
addressing the target shift through joint alignment. Finally, the mutual
information bounds are extended to this algorithm providing a non-vacuous
gradient-norm estimation. The proposed algorithm has comparable performance to
the state-of-the-art on target-shifted MDA benchmark with improved memory
efficiency.
( 2
min )
In this paper we consider a new class of RBF (Radial Basis Function) neural
networks, in which smoothing factors are replaced with shifts. We prove under
certain conditions on the activation function that these networks are capable
of approximating any continuous multivariate function on any compact subset of
the $d$-dimensional Euclidean space. For RBF networks with finitely many fixed
centroids we describe conditions guaranteeing approximation with arbitrary
precision.
( 2
min )
This paper focuses on optimal unimodal transformation of the score outputs of
a univariate learning model under linear loss functions. We demonstrate that
the optimal mapping between score values and the target region is a rectangular
function. To produce this optimal rectangular fit for the observed samples, we
propose a sequential approach that can its estimation with each incoming new
sample. Our approach has logarithmic time complexity per iteration and is
optimally efficient.
( 2
min )
This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP
methods are efficient iterative algorithms for solving image inverse problems
formulated as the minimization of the sum of a data-fidelity term and a
regularization term. PnP methods perform regularization by plugging a
pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent
(PGD). To ensure convergence of PnP schemes, many works study specific
parametrizations of deep denoisers. However, existing results require either
unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive
conditions on the parameters of the inverse problem. Observing that these
limitations can be due to the proximal algorithm in use, we study a relaxed
version of the PGD algorithm for minimizing the sum of a convex function and a
weakly convex one. When plugged with a relaxed proximal denoiser, we show that
the proposed PnP-$\alpha$PGD algorithm converges for a wider range of
regularization parameters, thus allowing more accurate image restoration.
( 2
min )
We perform an effective-theory analysis of forward-backward signal
propagation in wide and deep Transformers, i.e., residual neural networks with
multi-head self-attention blocks and multilayer perceptron blocks. This
analysis suggests particular width scalings of initialization and training
hyperparameters for these models. We then take up such suggestions, training
Vision and Language Transformers in practical setups.
( 2
min )
We establish disintegrated PAC-Bayesian generalisation bounds for models
trained with gradient descent methods or continuous gradient flows. Contrary to
standard practice in the PAC-Bayesian setting, our result applies to
optimisation algorithms that are deterministic, without requiring any
de-randomisation step. Our bounds are fully computable, depending on the
density of the initial distribution and the Hessian of the training objective
over the trajectory. We show that our framework can be applied to a variety of
iterative optimisation algorithms, including stochastic gradient descent (SGD),
momentum-based schemes, and damped Hamiltonian dynamics.
( 2
min )
Just stating what should be obvious.
https://raygun.com/blog/costly-software-errors-history/
https://en.wikipedia.org/wiki/List_of_software_bugs
I have no doubt that in the next few decades, A.I. will top the list of most expensive software bugs ever.
When A.I. can do superhuman things, like a calculator doing arithmetic, then it will have the power to do major oops.
submitted by /u/Terminator857
[link] [comments]
( 42
min )
submitted by /u/zen_tm
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
submitted by /u/abhinav_sk
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 43
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/
https://github.com/facebookresearch/segment-anything
Today, we aim to democratize segmentation by introducing the Segment Anything project: a new task, dataset, and model for image segmentation, as we explain in our research paper. We are releasing both our general Segment Anything Model (SAM) and our Segment Anything 1-Billion mask dataset (SA-1B), the largest ever segmentation dataset, to enable a broad set of applications and foster further research into foundation models for computer vision. We are making the SA-1B dataset available for research purposes and the Segment Anything Model is available under a permissive open license (Apache 2.0).
submitted by /u/Sirisian
[link] [comments]
( 44
min )
You can use Random Projections for dimensional reduction. Allowing small neural networks to process big data. They can be fast too.
https://ai462qqq.blogspot.com/2023/04/random-projections-for-neural-networks.html
submitted by /u/SeanHaddPS
[link] [comments]
( 44
min )
The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in […]
( 14
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve […]
( 6
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Many publishers have a large library of stock images that they use for their articles. These images can be reused many times for different stories, especially when the […]
( 8
min )
MLPerf remains the definitive measurement for AI performance as an independent, third-party benchmark. NVIDIA’s AI platform has consistently shown leadership across both training and inference since the inception of MLPerf, including the MLPerf Inference 3.0 benchmarks released today. “Three years ago when we introduced A100, the AI world was dominated by computer vision. Generative AI Read article >
( 7
min )
Seems like it could be useful to some others here
https://www.edgeimpulse.com/blog/unveiling-the-new-edge-impulse-python-sdk
submitted by /u/gtj
[link] [comments]
( 43
min )
https://reddit.com/link/12bohof/video/i5x73plm9wra1/player
Hi guys!
We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR 2023).
- Paper: https://arxiv.org/abs/2211.16374
- Project: https://gwang-kim.github.io/datid_3d/
- Code & Gradio Demo: https://github.com/gwang-kim/DATID-3D
- Colab Demo: https://colab.research.google.com/drive/1e9NSVB7x_hjz-nr4K0jO4rfTXILnNGtA?usp=sharing
DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.
We showcase the demo of text-guided manipulated 3D reconstruction beyond text-guided image manipulation!
https://i.redd.it/qadhxvpaawra1.gif
submitted by /u/ImBradleyKim
[link] [comments]
( 45
min )
submitted by /u/realZenLime
[link] [comments]
( 41
min )
submitted by /u/rowancheung
[link] [comments]
( 42
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra FAQs allow users to upload […]
( 8
min )
Time series are sequences of data points that occur in successive order over some period of time. We often analyze these data points to make better business decisions or gain competitive advantages. An example is Shimamura Music, who used Amazon Forecast to improve shortage rates and increase business efficiency. Another great example is Arneg, who […]
( 7
min )
NVIDIA today recognized a dozen partners in the Americas for their work enabling customers to build and deploy AI applications across a broad range of industries. NVIDIA Partner Network (NPN) Americas Partner of the Year awards were given out to companies in 13 categories covering AI, consulting, distribution, education, healthcare, integration, networking, the public sector, Read article >
( 6
min )
Video editor Patrick Stirling used the Magic Mask feature in Blackmagic Design’s DaVinci Resolve software to create a custom effect that creates textured animations of people, this week In the NVIDIA Studio.
( 6
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 41
min )
submitted by /u/sweetloup
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/st_nebula
[link] [comments]
( 41
min )
In his book The Book of Why, Judea Pearl advocates for teaching cause and effect principles to machines in order to enhance their intelligence. The accomplishments of deep learning are essentially just a type of curve fitting, whereas causality could be used to uncover interactions between the systems of the world under various constraints without […]
( 11
min )
The size and complexity of large language models (LLMs) have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to […]
( 15
min )
Sometimes you start a blog with a hypothesis in mind, and then that intention changes as you research and realize that your original idea was wrong. Yep, this is one of those blogs. Learning can be fun if you let go of pre-existing dogma and learn along your life journey. I’ve always been curious (a… Read More »Creating Healthy AI Utility Function: Importance of Diversity – Part I
The post Creating Healthy AI Utility Function: Importance of Diversity – Part I appeared first on Data Science Central.
( 22
min )
“DribbleBot” can maneuver a soccer ball on landscapes such as sand, gravel, mud, and snow, using reinforcement learning to adapt to varying ball dynamics.
( 10
min )
The evaluation of explanation methods is a research topic that has not yet
been explored deeply, however, since explainability is supposed to strengthen
trust in artificial intelligence, it is necessary to systematically review and
compare explanation methods in order to confirm their correctness. Until now,
no tool with focus on XAI evaluation exists that exhaustively and speedily
allows researchers to evaluate the performance of explanations of neural
network predictions. To increase transparency and reproducibility in the field,
we therefore built Quantus -- a comprehensive, evaluation toolkit in Python
that includes a growing, well-organised collection of evaluation metrics and
tutorials for evaluating explainable methods. The toolkit has been thoroughly
tested and is available under an open-source license on PyPi (or on
https://github.com/understandable-machine-intelligence-lab/Quantus/).
( 2
min )
Domain adaptation of GANs is a problem of fine-tuning the state-of-the-art
GAN models (e.g. StyleGAN) pretrained on a large dataset to a specific domain
with few samples (e.g. painting faces, sketches, etc.). While there are a great
number of methods that tackle this problem in different ways, there are still
many important questions that remain unanswered.
In this paper, we provide a systematic and in-depth analysis of the domain
adaptation problem of GANs, focusing on the StyleGAN model. First, we perform a
detailed exploration of the most important parts of StyleGAN that are
responsible for adapting the generator to a new domain depending on the
similarity between the source and target domains. As a result of this in-depth
study, we propose new efficient and lightweight parameterizations of StyleGAN
for domain adaptation. Particularly, we show there exist directions in
StyleSpace (StyleDomain directions) that are sufficient for adapting to similar
domains and they can be reduced further. For dissimilar domains, we propose
Affine$+$ and AffineLight$+$ parameterizations that allows us to outperform
existing baselines in few-shot adaptation with low data regime. Finally, we
examine StyleDomain directions and discover their many surprising properties
that we apply for domain mixing and cross-domain image morphing.
( 3
min )
Data privacy and ownership are significant in social data science, raising
legal and ethical concerns. Sharing and analyzing data is difficult when
different parties own different parts of it. An approach to this challenge is
to apply de-identification or anonymization techniques to the data before
collecting it for analysis. However, this can reduce data utility and increase
the risk of re-identification. To address these limitations, we present PADME,
a distributed analytics tool that federates model implementation and training.
PADME uses a federated approach where the model is implemented and deployed by
all parties and visits each data location incrementally for training. This
enables the analysis of data across locations while still allowing the model to
be trained as if all data were in a single location. Training the model on data
in its original location preserves data ownership. Furthermore, the results are
not provided until the analysis is completed on all data locations to ensure
privacy and avoid bias in the results.
( 3
min )
In smart electrical grids, fault detection tasks may have a high impact on
society due to their economic and critical implications. In the recent years,
numerous smart grid applications, such as defect detection and load
forecasting, have embraced data-driven methodologies. The purpose of this study
is to investigate the challenges associated with the security of machine
learning (ML) applications in the smart grid scenario. Indeed, the robustness
and security of these data-driven algorithms have not been extensively studied
in relation to all power grid applications. We demonstrate first that the deep
neural network method used in the smart grid is susceptible to adversarial
perturbation. Then, we highlight how studies on fault localization and type
classification illustrate the weaknesses of present ML algorithms in smart
grids to various adversarial attacks
( 2
min )
In this paper, we propose a methodology to align a medium-sized GPT model,
originally trained in English for an open domain, to a small closed domain in
Spanish. The application for which the model is finely tuned is the question
answering task. To achieve this we also needed to train and implement another
neural network (which we called the reward model) that could score and
determine whether an answer is appropriate for a given question. This component
served to improve the decoding and generation of the answers of the system.
Numerical metrics such as BLEU and perplexity were used to evaluate the model,
and human judgment was also used to compare the decoding technique with others.
Finally, the results favored the proposed method, and it was determined that it
is feasible to use a reward model to align the generation of responses.
( 2
min )
We study the convex hulls of reachable sets of nonlinear systems with bounded
disturbances. Reachable sets play a critical role in control, but remain
notoriously challenging to compute, and existing over-approximation tools tend
to be conservative or computationally expensive. In this work, we exactly
characterize the convex hulls of reachable sets as the convex hulls of
solutions of an ordinary differential equation from all possible initial values
of the disturbances. This finite-dimensional characterization unlocks a tight
estimation algorithm to over-approximate reachable sets that is significantly
faster and more accurate than existing methods. We present applications to
neural feedback loop analysis and robust model predictive control.
( 2
min )
We consider the problem of online multiclass learning when the number of
labels is unbounded. We show that the Multiclass Littlestone dimension, first
introduced in \cite{DanielyERMprinciple}, continues to characterize online
learnability in this setting. Our result complements the recent work by
\cite{Brukhimetal2022} who give a characterization of batch multiclass
learnability when the label space is unbounded.
( 2
min )
People with diabetes have to manage their blood glucose level to keep it
within an appropriate range. Predicting whether future glucose values will be
outside the healthy threshold is of vital importance in order to take
corrective actions to avoid potential health damage. In this paper we describe
our research with the aim of predicting the future behavior of blood glucose
levels, so that hypoglycemic events may be anticipated. The approach of this
work is the application of transformation functions on glucose time series, and
their use in convolutional neural networks. We have tested our proposed method
using real data from 4 different diabetes patients with promising results.
( 2
min )
Many organizations measure treatment effects via an experimentation platform
to evaluate the casual effect of product variations prior to full-scale
deployment. However, standard experimentation platforms do not perform
optimally for end user populations that exhibit heterogeneous treatment effects
(HTEs). Here we present a personalized experimentation framework, Personalized
Experiments (PEX), which optimizes treatment group assignment at the user level
via HTE modeling and sequential decision policy optimization to optimize
multiple short-term and long-term outcomes simultaneously. We describe an
end-to-end workflow that has proven to be successful in practice and can be
readily implemented using open-source software.
( 2
min )
Metadata quality is crucial for digital objects to be discovered through
digital library interfaces. However, due to various reasons, the metadata of
digital objects often exhibits incomplete, inconsistent, and incorrect values.
We investigate methods to automatically detect, correct, and canonicalize
scholarly metadata, using seven key fields of electronic theses and
dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that
utilizes state-of-the-art artificial intelligence methods to improve the
quality of these fields. To evaluate MetaEnhance, we compiled a metadata
quality evaluation benchmark containing 500 ETDs, by combining subsets sampled
using multiple criteria. We tested MetaEnhance on this benchmark and found that
the proposed methods achieved nearly perfect F1-scores in detecting errors and
F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven
fields.
( 2
min )
In this paper, we introduce the range of oBERTa language models, an
easy-to-use set of language models, which allows Natural Language Processing
(NLP) practitioners to obtain between 3.8 and 24.3 times faster models without
expertise in model compression. Specifically, oBERTa extends existing work on
pruning, knowledge distillation, and quantization and leverages frozen
embeddings to improve knowledge distillation, and improved model initialization
to deliver higher accuracy on a a broad range of transfer tasks. In generating
oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with
respect to pruning during pre-training and fine-tuning and find it less
amenable to compression during fine-tuning. We explore the use of oBERTa on a
broad seven representative NLP tasks and find that the improved compression
techniques allow a pruned oBERTa model to match the performance of BERTBASE and
exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering
dataset, despite being 8x and 2x, respectively, faster in inference. We release
our code, training regimes, and associated model for broad usage to encourage
usage and experimentation.
( 2
min )
In this paper, we revisit the problem of Differentially Private Stochastic
Convex Optimization (DP-SCO) in Euclidean and general $\ell_p^d$ spaces.
Specifically, we focus on three settings that are still far from well
understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean
space; (2) unconstrained DP-SCO in $\ell_p^d$ space; (3) DP-SCO with
heavy-tailed data over a constrained and bounded set in $\ell_p^d$ space. For
problem (1), for both convex and strongly convex loss functions, we propose
methods whose outputs could achieve (expected) excess population risks that are
only dependent on the Gaussian width of the constraint set rather than the
dimension of the space. Moreover, we also show the bound for strongly convex
functions is optimal up to a logarithmic factor. For problems (2) and (3), we
propose several novel algorithms and provide the first theoretical results for
both cases when $1<p<2$ and $2\leq p\leq \infty$.
( 2
min )
submitted by /u/Science_is_Greatness
[link] [comments]
( 42
min )
submitted by /u/tyw7
[link] [comments]
( 44
min )
submitted by /u/ea_man
[link] [comments]
( 42
min )
submitted by /u/dragon_6666
[link] [comments]
( 43
min )
Here's a video that presents a very interesting solution to alignment problems: https://youtu.be/fKgPg_j9eF0
Hope you learned something new!
submitted by /u/RamazanBlack
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/SuckMyPenisReddit
[link] [comments]
( 43
min )
Hi all,
Recently I wrote a small blog article regarding predicting football (soccer) match outcomes using Machine Learning and utilizing bookmakers odds. I tested also real betting scenarios using the ML predictions developed. TL;DR: Using ML and Bookmakers odds to predict soccer matches results in better than literature accuracy. However it is not enough to provide consistent profit.
Blog post : https://medium.com/@grstathis/predicting-football-soccer-match-outcomes-using-bookmaker-betting-odds-477c62b2e0e9
I hope it is something interesting, feedback is always welcome :-)
submitted by /u/touristroni
[link] [comments]
( 43
min )
submitted by /u/edrulesok
[link] [comments]
( 43
min )
https://www.youtube.com/watch?v=ZZ0atq2yYJw&list=LL&index=3
submitted by /u/norcalnatv
[link] [comments]
( 43
min )
Paper: https://arxiv.org/abs/2303.17580
Abstract:
Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence (AGI). While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a system that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., HuggingFace) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in HuggingFace, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards AGI.
https://preview.redd.it/huc5so9f1ira1.jpg?width=1201&format=pjpg&auto=webp&s=cd714263f8a6ea443195316d95704fd550beee95
https://preview.redd.it/d2dfhs9f1ira1.jpg?width=655&format=pjpg&auto=webp&s=07fcb2b969cdaaf649aed259296f3dfa9157531e
https://preview.redd.it/v4gc9r9f1ira1.jpg?width=773&format=pjpg&auto=webp&s=b014fa679a7bdc2024a3d27690950be2248735aa
submitted by /u/Singularian2501
[link] [comments]
( 48
min )
Bloomberg released BloombergGPT for finance. This is the first of a kind LLM for finance.
https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/
I also reviewed the article and publication on medium. This should give you a TLDR of VERY LONG article.
https://pub.towardsai.net/bloomberggpt-the-first-gpt-for-finance-72670f99566a
submitted by /u/Ok-Range1608
[link] [comments]
( 47
min )
submitted by /u/saharNooby
[link] [comments]
( 43
min )
From the same lab that developed FlashAttention. They tried their approach with 64k tokens, if I read this correctly, and claim it can be scaled up massively.
Blogpost: https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning
Paper: https://arxiv.org/abs/2302.10866#
submitted by /u/ReasonablyBadass
[link] [comments]
( 46
min )
submitted by /u/Desi___Gigachad
[link] [comments]
( 50
min )
submitted by /u/RamazanBlack
[link] [comments]
( 43
min )
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Keltushadowfang
[link] [comments]
( 42
min )
submitted by /u/Midiall
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
submitted by /u/benaugustine
[link] [comments]
( 47
min )
submitted by /u/typcalthowawayacount
[link] [comments]
( 42
min )
submitted by /u/PeckCentral
[link] [comments]
( 43
min )
submitted by /u/TalkinBen2000
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/ThickDoctor007
[link] [comments]
( 43
min )
submitted by /u/sivstarlight
[link] [comments]
( 44
min )
submitted by /u/goldemerald
[link] [comments]
( 44
min )
submitted by /u/alen_smajic
[link] [comments]
( 43
min )
submitted by /u/challengingviews
[link] [comments]
( 43
min )
submitted by /u/sigpwned
[link] [comments]
( 43
min )
submitted by /u/davidbun
[link] [comments]
( 44
min )
submitted by /u/seraschka
[link] [comments]
( 43
min )
There is a new project that Epic Games have announced that will allow developers to train ML agents in Unreal Engine.
Post here:
https://dev.epicgames.com/community/learning/tutorials/8OWY/unreal-engine-learning-agents-introduction
Can't wait to play with it! It has only just been announced so no estimate on when they will release it (in beta/experimental form).
submitted by /u/romantimm25
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/logosfabula
[link] [comments]
( 42
min )
Hey, I was just watching the GTC 2023 Keynote with NVIDIA CEO Jensen Huang (shorter version) and something struck me. It somehow looks weird. Maybe those are just video compression artifacts, but he is very blurry in the lower face area, and you can clearly see it if you pause (not cherry picked screencap). Check his keynote from last year, there is no blur at all. And 2023 video looks wierd in some other ways too, lip sync is kinda off and so on. I know that Nvidia was showing a short Jensen Huang deepfake a two years ago, so does this mean that this year they have decided to generate the whole keynote and nobody has noticed?
submitted by /u/wojtek15
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/EbayMustache
[link] [comments]
( 47
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 42
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 44
min )
https://youtu.be/AaTRHFaaPG8
This guy is one of the key experts and has a video online called We're all going to die!
It would be great if someone could edit this down to the key points.
submitted by /u/zascar
[link] [comments]
( 42
min )
https://github.com/kabouzeid/turm
I wanted to share my latest side project: a simple lazygit-like TUI for the Slurm Workload Manager. I'm still working on adding more functionality, but I wanted to share what I have so far and get feedback from the community.
submitted by /u/kabouzeid
[link] [comments]
( 43
min )
https://www.openpetition.eu/petition/online/securing-our-digital-future-a-cern-for-open-source-large-scale-ai-research-and-its-safety
Join us in our urgent mission to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models. This monumental initiative will secure our technological independence, empower global innovation, and ensure safety, while safeguarding our democratic principles for generations to come.
submitted by /u/stringShuffle
[link] [comments]
( 49
min )
Train a general DNN from scratch to automatically achieve both high performance and slim structure simultaneously.
Publications in ICLR 2023 and NeurIPS 2021.
Github: https://github.com/tianyic/only_train_once
https://preview.redd.it/2rnfd6m2a0ra1.png?width=2150&format=png&auto=webp&s=cea3e84ae72ee784236befffc78e71711da08b64
submitted by /u/No-Egg6431
[link] [comments]
( 44
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
This post was co-written with Tony Momenpour and Drew Clark from KYTC. Government departments and businesses operate contact centers to connect with their communities, enabling citizens and customers to call to make appointments, request services, and sometimes just ask a question. When there are more calls than agents can answer, callers get placed on hold […]
( 7
min )
Intelligent document processing (IDP) with AWS helps automate information extraction from documents of different types and formats, quickly and with high accuracy, without the need for machine learning (ML) skills. Faster information extraction with high accuracy can help you make quality business decisions on time, while reducing overall costs. For more information, refer to Intelligent […]
( 8
min )
Like many managers in the corporate world, until recently I thought you should not use these tools. The common theme is that it’s for small projects or classroom problems. Not for the real world. Then, in the process of designing a new course, I had to work with notebooks. Because all classes use notebooks these… Read More »My First Notebook and Colab Project: Sharing my Thoughts
The post My First Notebook and Colab Project: Sharing my Thoughts appeared first on Data Science Central.
( 21
min )
Machine learning (ML) and Artificial Intelligence (AI) have been receiving a lot of public interest in recent years, with both terms being practically common in the IT language. Despite their similarities, there are some important differences between ML and AI that are frequently neglected. Thus we will cover the key differences between ML and AI… Read More »Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences
The post Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences appeared first on Data Science Central.
( 23
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
MIT researchers built DiffDock, a model that may one day be able to find new drugs faster than traditional methods and reduce the potential for adverse side effects.
( 10
min )
Several research works have applied Reinforcement Learning (RL) algorithms to
solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of
the radio link requires the algorithms to be responsive to changes in link
quality. Delays in the execution of the algorithm may be detrimental to its
performance, which in turn may decrease network performance. This aspect has
been overlooked in the state of the art. In this paper, we present an analysis
of common computational delays in RL-based RA algorithms, and propose a
methodology that may be applied to reduce these computational delays and
increase the efficiency of this type of algorithms. We apply the proposed
methodology to an existing RL-based RA algorithm. The obtained experimental
results indicate a reduction of one order of magnitude in the execution time of
the algorithm, improving its responsiveness to link quality changes.
( 2
min )
Continual learning (CL) aims to learn a sequence of tasks over time, with
data distributions shifting from one task to another. When training on new task
data, data representations from old tasks may drift. Some negative
representation drift can result in catastrophic forgetting, by causing the
locally learned class prototypes and data representations to correlate poorly
across tasks. To mitigate such representation drift, we propose a method that
finds global prototypes to guide the learning, and learns data representations
with the regularization of the self-supervised information. Specifically, for
NLP tasks, we formulate each task in a masked language modeling style, and
learn the task via a neighbor attention mechanism over a pre-trained language
model. Experimental results show that our proposed method can learn fairly
consistent representations with less representation drift, and significantly
reduce catastrophic forgetting in CL without resampling data from past tasks.
( 2
min )
Offline reinforcement learning (RL) allows for the training of competent
agents from offline datasets without any interaction with the environment.
Online finetuning of such offline models can further improve performance. But
how should we ideally finetune agents obtained from offline RL training? While
offline RL algorithms can in principle be used for finetuning, in practice,
their online performance improves slowly. In contrast, we show that it is
possible to use standard online off-policy algorithms for faster improvement.
However, we find this approach may suffer from policy collapse, where the
policy undergoes severe performance deterioration during initial online
learning. We investigate the issue of policy collapse and how it relates to
data diversity, algorithm choices and online replay distribution. Based on
these insights, we propose a conservative policy optimization procedure that
can achieve stable and sample-efficient online learning from offline
pretraining.
( 2
min )
Multi-view clustering (MvC) aims at exploring the category structure among
multi-view data without label supervision. Multiple views provide more
information than single views and thus existing MvC methods can achieve
satisfactory performance. However, their performance might seriously degenerate
when the views are noisy in practical scenarios. In this paper, we first
formally investigate the drawback of noisy views and then propose a
theoretically grounded deep MvC method (namely MvCAN) to address this issue.
Specifically, we propose a novel MvC objective that enables un-shared
parameters and inconsistent clustering predictions across multiple views to
reduce the side effects of noisy views. Furthermore, a non-parametric iterative
process is designed to generate a robust learning target for mining multiple
views' useful information. Theoretical analysis reveals that MvCAN works by
achieving the multi-view consistency, complementarity, and noise robustness.
Finally, experiments on public datasets demonstrate that MvCAN outperforms
state-of-the-art methods and is robust against the existence of noisy views.
( 2
min )
In the field of functional genomics, the analysis of gene expression profiles
through Machine and Deep Learning is increasingly providing meaningful insight
into a number of diseases. The paper proposes a novel algorithm to perform
Feature Selection on genomic-scale data, which exploits the reconstruction
capabilities of autoencoders and an ad-hoc defined Explainable Artificial
Intelligence-based score in order to select the most informative genes for
diagnosis, prognosis, and precision medicine. Results of the application on a
Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the
algorithm, by identifying and suggesting a set of meaningful genes for further
medical investigation.
( 2
min )
Given a graph with a subset of labeled nodes, we are interested in the
quality of the averaging estimator which for an unlabeled node predicts the
average of the observations of its labeled neighbours. We rigorously study
concentration properties, variance bounds and risk bounds in this context.
While the estimator itself is very simple and the data generating process is
too idealistic for practical applications, we believe that our small steps will
contribute towards the theoretical understanding of more sophisticated methods
such as Graph Neural Networks.
( 2
min )
We propose a generic spatiotemporal framework to analyze manifold-valued
measurements, which allows for employing an intrinsic and computationally
efficient Riemannian hierarchical model. Particularly, utilizing regression, we
represent discrete trajectories in a Riemannian manifold by composite B\' ezier
splines, propose a natural metric induced by the Sasaki metric to compare the
trajectories, and estimate average trajectories as group-wise trends. We
evaluate our framework in comparison to state-of-the-art methods within
qualitative and quantitative experiments on hurricane tracks. Notably, our
results demonstrate the superiority of spline-based approaches for an intensity
classification of the tracks.
( 2
min )
We show that symmetrically padded convolution can be analytically inverted
via DFT. We comprehensively analyze several different symmetric and
anti-symmetric padding modes and show that multiple cases exist where the
inversion can be achieved. The implementation is available at
\url{https://github.com/prclibo/iconv_dft}.
( 2
min )
We present DiffCollage, a compositional diffusion model that can generate
large content by leveraging diffusion models trained on generating pieces of
the large content. Our approach is based on a factor graph representation where
each factor node represents a portion of the content and a variable node
represents their overlap. This representation allows us to aggregate
intermediate outputs from diffusion models defined on individual nodes to
generate content of arbitrary size and shape in parallel without resorting to
an autoregressive generation procedure. We apply DiffCollage to various tasks,
including infinite image generation, panorama image generation, and
long-duration text-guided motion generation. Extensive experimental results
with a comparison to strong autoregressive baselines verify the effectiveness
of our approach.
( 2
min )
Conventional optimization methods in machine learning and controls rely
heavily on first-order update rules. Selecting the right method and
hyperparameters for a particular task often involves trial-and-error or
practitioner intuition, motivating the field of meta-learning. We generalize a
broad family of preexisting update rules by proposing a meta-learning framework
in which the inner loop optimization step involves solving a differentiable
convex optimization (DCO). We illustrate the theoretical appeal of this
approach by showing that it enables one-step optimization of a family of linear
least squares problems, given that the meta-learner has sufficient exposure to
similar tasks. Various instantiations of the DCO update rule are compared to
conventional optimizers on a range of illustrative experimental settings.
( 2
min )
In orthogonal world coordinates, a Manhattan world lying along cuboid
buildings is widely useful for various computer vision tasks. However, the
Manhattan world has much room for improvement because the origin of pan angles
from an image is arbitrary, that is, four-fold rotational symmetric ambiguity
of pan angles. To address this problem, we propose a definition for the
pan-angle origin based on the directions of the roads with respect to a camera
and the direction of travel. We propose a learning-based calibration method
that uses heatmap regression to remove the ambiguity by each direction of
labeled image coordinates, similar to pose estimation keypoints.
Simultaneously, our two-branched network recovers the rotation and removes
fisheye distortion from a general scene image. To alleviate the lack of
vanishing points in images, we introduce auxiliary diagonal points that have
the optimal 3D arrangement of spatial uniformity. Extensive experiments
demonstrated that our method outperforms conventional methods on large-scale
datasets and with off-the-shelf cameras.
( 2
min )
Given a graph with a subset of labeled nodes, we are interested in the
quality of the averaging estimator which for an unlabeled node predicts the
average of the observations of its labeled neighbours. We rigorously study
concentration properties, variance bounds and risk bounds in this context.
While the estimator itself is very simple and the data generating process is
too idealistic for practical applications, we believe that our small steps will
contribute towards the theoretical understanding of more sophisticated methods
such as Graph Neural Networks.
( 2
min )
The practice of uncertainty quantification (UQ) validation, notably in
machine learning for the physico-chemical sciences, rests on several graphical
methods (scattering plots, calibration curves, reliability diagrams and
confidence curves) which explore complementary aspects of calibration, without
covering all the desirable ones. For instance, none of these methods deals with
the reliability of UQ metrics across the range of input features (adaptivity).
Based on the complementary concepts of consistency and adaptivity, the toolbox
of common validation methods for variance- and intervals- based UQ metrics is
revisited with the aim to provide a better grasp on their capabilities. This
study is conceived as an introduction to UQ validation, and all methods are
derived from a few basic rules. The methods are illustrated and tested on
synthetic datasets and representative examples extracted from the recent
physico-chemical machine learning UQ literature.
( 2
min )
Multi-label learning is usually used to mine the correlation between features
and labels, and feature selection can retain as much information as possible
through a small number of features. $\ell_{2,1}$ regularization method can get
sparse coefficient matrix, but it can not solve multicollinearity problem
effectively. The model proposed in this paper can obtain the most relevant few
features by solving the joint constrained optimization problems of $\ell_{2,1}$
and $\ell_{F}$ regularization.In manifold regularization, we implement random
walk strategy based on joint information matrix, and get a highly robust
neighborhood graph.In addition, we given the algorithm for solving the model
and proved its convergence.Comparative experiments on real-world data sets show
that the proposed method outperforms other methods.
( 2
min )
A common lens to theoretically study neural net architectures is to analyze
the functions they can approximate. However, constructions from approximation
theory may be unrealistic and therefore less meaningful. For example, a common
unrealistic trick is to encode target function values using infinite precision.
To address these issues, this work proposes a formal definition of
statistically meaningful (SM) approximation which requires the approximating
network to exhibit good statistical learnability. We study SM approximation for
two function classes: boolean circuits and Turing machines. We show that
overparameterized feedforward neural nets can SM approximate boolean circuits
with sample complexity depending only polynomially on the circuit size, not the
size of the network. In addition, we show that transformers can SM approximate
Turing machines with computation time bounded by $T$ with sample complexity
polynomial in the alphabet size, state space size, and $\log (T)$. We also
introduce new tools for analyzing generalization which provide much tighter
sample complexities than the typical VC-dimension or norm-based bounds, which
may be of independent interest.
( 3
min )
We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps
with variational principle. We discuss the metric properties of WBs and explore
their connections, especially the connections of Monge WBs, to K-means
clustering and co-clustering. We also discuss the feasibility of Monge WBs on
unbalanced measures and spherical domains. We propose two new problems --
regularized K-means and Wasserstein barycenter compression. We demonstrate the
use of VWBs in solving these clustering-related problems.
( 2
min )
This paper studies the approximation capacity of ReLU neural networks with
norm constraint on the weights. We prove upper and lower bounds on the
approximation error of these networks for smooth function classes. The lower
bound is derived through the Rademacher complexity of neural networks, which
may be of independent interest. We apply these approximation bounds to analyze
the convergences of regression using norm constrained neural networks and
distribution estimation by GANs. In particular, we obtain convergence rates for
over-parameterized neural networks. It is also shown that GANs can achieve
optimal rate of learning probability distributions, when the discriminator is a
properly chosen norm constrained neural network.
( 2
min )
The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a
well-known method to approximate solutions of saddle-point problems and their
extensions such as variational inequalities and monotone inclusions. Over the
years, numerous variants of EG have been proposed and studied in the
literature. Recently, these methods have gained popularity due to new
applications in machine learning and robust optimization. In this work, we
survey the latest developments in the EG method and its variants for
approximating solutions of nonlinear equations and inclusions, with a focus on
the monotonicity and co-hypomonotonicity settings. We provide a unified
convergence analysis for different classes of algorithms, with an emphasis on
sublinear best-iterate and last-iterate convergence rates. We also discuss
recent accelerated variants of EG based on both Halpern fixed-point iteration
and Nesterov's accelerated techniques. Our approach uses simple arguments and
basic mathematical tools to make the proofs as elementary as possible, while
maintaining generality to cover a broad range of problems.
( 2
min )
We present a hierarchical Bayesian learning approach to infer jointly sparse
parameter vectors from multiple measurement vectors. Our model uses separate
conditionally Gaussian priors for each parameter vector and common
gamma-distributed hyper-parameters to enforce joint sparsity. The resulting
joint-sparsity-promoting priors are combined with existing Bayesian inference
methods to generate a new family of algorithms. Our numerical experiments,
which include a multi-coil magnetic resonance imaging application, demonstrate
that our new approach consistently outperforms commonly used hierarchical
Bayesian methods.
( 2
min )
submitted by /u/azlef900
[link] [comments]
( 42
min )
submitted by /u/DarronFeldstein
[link] [comments]
( 42
min )
submitted by /u/Linkology
[link] [comments]
( 42
min )
submitted by /u/acutelychronicpanic
[link] [comments]
( 43
min )
submitted by /u/LickTempo
[link] [comments]
( 44
min )
submitted by /u/friuns
[link] [comments]
( 44
min )
[generated via AI]
submitted by /u/Aram0070
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
Paper: https://arxiv.org/abs/2303.16434
Abstract:
Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain …
( 45
min )
How easy? As easy as:
python usap_csv_eval.py data/credit-approval.csv
If your dataset is in csv format you can use this tool to get an initial indication of how predictable a target feature is. No need to sort attributes, look for missing cells etc.
The tool uses "deodel" as a robust mixed attribute classifier. Get more details at:
csv_dataset_eval.ipynb
submitted by /u/zx2zx
[link] [comments]
( 44
min )
submitted by /u/floppy_llama
[link] [comments]
( 43
min )
With the right building blocks, machine-learning models can more accurately perform tasks like fraud detection or spam filtering.
( 9
min )
Amazon Personalize is excited to announce the new Trending-Now recipe to help you recommend items gaining popularity at the fastest pace among your users. Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by […]
( 10
min )
In football, ball possession is a strong predictor for team success. It’s hard to control the game without having control over the ball. In the past three Bundesliga seasons, as well as in the current season (at the time of this writing), Bayern Munich is ranked first in the table and in ball possession percentage, […]
( 8
min )
The Bundesliga is renowned for its exceptional goalkeepers, making it potentially the most prominent among Europe’s top five leagues in this regard. Apart from the widely recognized Manuel Neuer, the Bundesliga has produced remarkable goalkeepers who have excelled in other leagues, including the likes of Marc-André ter Stegen, who is a superstar at Barcelona. In […]
( 9
min )
Powerful new large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this new Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these […]
The post AI Frontiers: AI for health and the future of research with Peter Lee appeared first on Microsoft Research.
( 27
min )
It’s another rewarding GFN Thursday, with 23 new games for April on top of 11 joining the cloud this week and a new Marvel’s Midnight Suns reward now available first for GeForce NOW Premium members. Newark, N.J., is next to complete its upgrade to RTX 4080 SuperPODs, making it the 12th region worldwide to bring Read article >
( 6
min )
There are plenty of graph neural network (GNN) accelerators being proposed.
However, they highly rely on users' hardware expertise and are usually
optimized for one specific GNN model, making them challenging for practical use
. Therefore, in this work, we propose GNNBuilder, the first automated, generic,
end-to-end GNN accelerator generation framework. It features four advantages:
(1) GNNBuilder can automatically generate GNN accelerators for a wide range of
GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch
programming interface, introducing zero overhead for algorithm developers; (3)
GNNBuilder supports end-to-end code generation, simulation, accelerator
optimization, and hardware deployment, realizing a push-button fashion for GNN
accelerator design; (4) GNNBuilder is equipped with accurate performance models
of its generated accelerator, enabling fast and flexible design space
exploration (DSE). In the experiments, first, we show that our accelerator
performance model has errors within $36\%$ for latency prediction and $18\%$
for BRAM count prediction. Second, we show that our generated accelerators can
outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is
open-source, and the code is available at
https://anonymous.4open.science/r/gnn-builder-83B4/.
( 2
min )
We present a multimodal deep learning (MDL) framework for predicting physical
properties of a 10-dimensional acrylic polymer composite material by merging
physical attributes and chemical data. Our MDL model comprises four modules,
including three generative deep learning models for material structure
characterization and a fourth model for property prediction. Our approach
handles an 18-dimensional complexity, with 10 compositional inputs and 8
property outputs, successfully predicting 913,680 property data points across
114,210 composition conditions. This level of complexity is unprecedented in
computational materials science, particularly for materials with undefined
structures. We propose a framework to analyze the high-dimensional information
space for inverse material design, demonstrating flexibility and adaptability
to various materials and scales, provided sufficient data is available. This
study advances future research on different materials and the development of
more sophisticated models, drawing us closer to the ultimate goal of predicting
all properties of all materials.
( 2
min )
In this work we develop a novel approach using deep neural networks to
reconstruct the conductivity distribution in elliptic problems from one
internal measurement. The approach is based on a mixed reformulation of the
governing equation and utilizes the standard least-squares objective to
approximate the conductivity and flux simultaneously, with deep neural networks
as ansatz functions. We provide a thorough analysis of the neural network
approximations for both continuous and empirical losses, including rigorous
error estimates that are explicit in terms of the noise level, various penalty
parameters and neural network architectural parameters (depth, width and
parameter bound). We also provide extensive numerical experiments in two- and
multi-dimensions to illustrate distinct features of the approach, e.g.,
excellent stability with respect to data noise and capability of solving
high-dimensional problems.
( 2
min )
An increasing part of energy is produced from renewable sources by a large
number of small producers. The efficiency of these sources is volatile and, to
some extent, random, exacerbating the energy market balance problem. In many
countries, that balancing is performed on day-ahead (DA) energy markets. In
this paper, we consider automated trading on a DA energy market by a medium
size prosumer. We model this activity as a Markov Decision Process and
formalize a framework in which a ready-to-use strategy can be optimized with
real-life data. We synthesize parametric trading strategies and optimize them
with an evolutionary algorithm. We also use state-of-the-art reinforcement
learning algorithms to optimize a black-box trading strategy fed with available
information from the environment that can impact future prices.
( 2
min )
This note focuses on a simple approach to the unified analysis of SGD-type
methods from (Gorbunov et al., 2020) for strongly convex smooth optimization
problems. The similarities in the analyses of different stochastic first-order
methods are discussed along with the existing extensions of the framework. The
limitations of the analysis and several alternative approaches are mentioned as
well.
( 2
min )
The generalization performance of deep neural networks with regard to the
optimization algorithm is one of the major concerns in machine learning. This
performance can be affected by various factors. In this paper, we theoretically
prove that the Lipschitz constant of a loss function is an important factor to
diminish the generalization error of the output model obtained by Adam or
AdamW. The results can be used as a guideline for choosing the loss function
when the optimization algorithm is Adam or AdamW. In addition, to evaluate the
theoretical bound in a practical setting, we choose the human age estimation
problem in computer vision. For assessing the generalization better, the
training and test datasets are drawn from different distributions. Our
experimental evaluation shows that the loss function with lower Lipschitz
constant and maximum value improves the generalization of the model trained by
Adam or AdamW.
( 2
min )
We investigate semantic guarantees of private learning algorithms for their
resilience to training Data Reconstruction Attacks (DRAs) by informed
adversaries. To this end, we derive non-asymptotic minimax lower bounds on the
adversary's reconstruction error against learners that satisfy differential
privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate
that our lower bound analysis for the latter also covers the high dimensional
regime, wherein, the input data dimensionality may be larger than the
adversary's query budget. Motivated by the theoretical improvements conferred
by metric DP, we extend the privacy analysis of popular deep learning
algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion
of metric differential privacy.
( 2
min )
The paper discusses the limitations of deep learning models in identifying
and utilizing features that remain invariant under a bijective transformation
on the data entries, which we refer to as combinatorial patterns. We argue that
the identification of such patterns may be important for certain applications
and suggest providing neural networks with information that fully describes the
combinatorial patterns of input entries and allows the network to determine
what is relevant for prediction. To demonstrate the feasibility of this
approach, we present a combinatorial convolutional neural network for word
classification.
( 2
min )
Predicting crime using machine learning and deep learning techniques has
gained considerable attention from researchers in recent years, focusing on
identifying patterns and trends in crime occurrences. This review paper
examines over 150 articles to explore the various machine learning and deep
learning algorithms applied to predict crime. The study provides access to the
datasets used for crime prediction by researchers and analyzes prominent
approaches applied in machine learning and deep learning algorithms to predict
crime, offering insights into different trends and factors related to criminal
activities. Additionally, the paper highlights potential gaps and future
directions that can enhance the accuracy of crime prediction. Finally, the
comprehensive overview of research discussed in this paper on crime prediction
using machine learning and deep learning approaches serves as a valuable
reference for researchers in this field. By gaining a deeper understanding of
crime prediction techniques, law enforcement agencies can develop strategies to
prevent and respond to criminal activities more effectively.
( 3
min )
Classical results in neural network approximation theory show how arbitrary
continuous functions can be approximated by networks with a single hidden
layer, under mild assumptions on the activation function. However, the
classical theory does not give a constructive means to generate the network
parameters that achieve a desired accuracy. Recent results have demonstrated
that for specialized activation functions, such as ReLUs and some classes of
analytic functions, high accuracy can be achieved via linear combinations of
randomly initialized activations. These recent works utilize specialized
integral representations of target functions that depend on the specific
activation functions used. This paper defines mollified integral
representations, which provide a means to form integral representations of
target functions using activations for which no direct integral representation
is currently known. The new construction enables approximation guarantees for
randomly initialized networks for a variety of widely used activation
functions.
( 2
min )
We investigate semantic guarantees of private learning algorithms for their
resilience to training Data Reconstruction Attacks (DRAs) by informed
adversaries. To this end, we derive non-asymptotic minimax lower bounds on the
adversary's reconstruction error against learners that satisfy differential
privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate
that our lower bound analysis for the latter also covers the high dimensional
regime, wherein, the input data dimensionality may be larger than the
adversary's query budget. Motivated by the theoretical improvements conferred
by metric DP, we extend the privacy analysis of popular deep learning
algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion
of metric differential privacy.
( 2
min )
This paper provides a finite-time analysis of linear stochastic approximation
(LSA) algorithms with fixed step size, a core method in statistics and machine
learning. LSA is used to compute approximate solutions of a $d$-dimensional
linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$ for which
$(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by
(asymptotically) unbiased observations
$\{(\mathbf{A}(Z_n),\mathbf{b}(Z_n))\}_{n \in \mathbb{N}}$. We consider here
the case where $\{Z_n\}_{n \in \mathbb{N}}$ is an i.i.d. sequence or a
uniformly geometrically ergodic Markov chain. We derive $p$-th moment and
high-probability deviation bounds for the iterates defined by LSA and its
Polyak-Ruppert-averaged version. Our finite-time instance-dependent bounds for
the averaged LSA iterates are sharp in the sense that the leading term we
obtain coincides with the local asymptotic minimax limit. Moreover, the
remainder terms of our bounds admit a tight dependence on the mixing time
$t_{\operatorname{mix}}$ of the underlying chain and the norm of the noise
variables. We emphasize that our result requires the SA step size to scale only
with logarithm of the problem dimension $d$.
( 2
min )
In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error
bound for linear time-invariant (LTI) stochastic dynamical systems with inputs.
Such bounds are widespread in machine learning, and they are useful for
characterizing the predictive power of models learned from finitely many data
points. In particular, with the bound derived in this paper relates future
average prediction errors with the prediction error generated by the model on
the data used for learning. In turn, this allows us to provide finite-sample
error bounds for a wide class of learning/system identification algorithms.
Furthermore, as LTI systems are a sub-class of recurrent neural networks
(RNNs), these error bounds could be a first step towards PAC-Bayesian bounds
for RNNs.
( 2
min )
This paper considers binary classification of high-dimensional features under
a postulated model with a low-dimensional latent Gaussian mixture structure and
non-vanishing noise. A generalized least squares estimator is used to estimate
the direction of the optimal separating hyperplane. The estimated hyperplane is
shown to interpolate on the training data. While the direction vector can be
consistently estimated as could be expected from recent results in linear
regression, a naive plug-in estimate fails to consistently estimate the
intercept. A simple correction, that requires an independent hold-out sample,
renders the procedure minimax optimal in many scenarios. The interpolation
property of the latter procedure can be retained, but surprisingly depends on
the way the labels are encoded.
( 2
min )
We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$
regret bound, where $d$ is the dimension of contexts and $T$ isthe time
horizon. Our proposed algorithm is equipped with a novel estimator in which
exploration is embedded through explicit randomization. Depending on the
randomization, our proposed estimator takes contributions either from contexts
of all arms or from selected contexts. We establish a self-normalized bound for
our estimator, which allows a novel decomposition of the cumulative regret into
\textit{additive} dimension-dependent terms instead of multiplicative terms. We
also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem
setting. Hence, the regret of our proposed algorithm matches the lower bound up
to logarithmic factors. The numerical experiments support the theoretical
guarantees and show that our proposed method outperforms the existing linear
bandit algorithms.
( 2
min )
Orthogonality constraints naturally appear in many machine learning problems,
from Principal Components Analysis to robust neural network training. They are
usually solved using Riemannian optimization algorithms, which minimize the
objective function while enforcing the constraint. However, enforcing the
orthogonality constraint can be the most time-consuming operation in such
algorithms. Recently, Ablin & Peyr\'e (2022) proposed the Landing algorithm, a
method with cheap iterations that does not enforce the orthogonality constraint
but is attracted towards the manifold in a smooth manner. In this article, we
provide new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold, the set of rectangular
orthogonal matrices. We also consider stochastic and variance reduction
algorithms when the cost function is an average of many functions. We
demonstrate that all these methods have the same rate of convergence as their
Riemannian counterparts that exactly enforce the constraint. Finally, our
experiments demonstrate the promise of our approach to an array of
machine-learning problems that involve orthogonality constraints.
( 2
min )
Inverse optimal control methods can be used to characterize behavior in
sequential decision-making tasks. Most existing work, however, requires the
control signals to be known, or is limited to fully-observable or linear
systems. This paper introduces a probabilistic approach to inverse optimal
control for stochastic non-linear systems with missing control signals and
partial observability that unifies existing approaches. By using an explicit
model of the noise characteristics of the sensory and control systems of the
agent in conjunction with local linearization techniques, we derive an
approximate likelihood for the model parameters, which can be computed within a
single forward pass. We evaluate our proposed method on stochastic and
partially observable version of classic control tasks, a navigation task, and a
manual reaching task. The proposed method has broad applicability, ranging from
imitation learning to sensorimotor neuroscience.
( 2
min )
Individualized treatment decisions can improve health outcomes, but using
data to make these decisions in a reliable, precise, and generalizable way is
challenging with a single dataset. Leveraging multiple randomized controlled
trials allows for the combination of datasets with unconfounded treatment
assignment to improve the power to estimate heterogeneous treatment effects.
This paper discusses several non-parametric approaches for estimating
heterogeneous treatment effects using data from multiple trials. We extend
single-study methods to a scenario with multiple trials and explore their
performance through a simulation study, with data generation scenarios that
have differing levels of cross-trial heterogeneity. The simulations demonstrate
that methods that directly allow for heterogeneity of the treatment effect
across trials perform better than methods that do not, and that the choice of
single-study method matters based on the functional form of the treatment
effect. Finally, we discuss which methods perform well in each setting and then
apply them to four randomized controlled trials to examine effect heterogeneity
of treatments for major depressive disorder.
( 2
min )
In this paper, we propose a randomly projected convex clustering model for
clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$
with $K$ hidden clusters. Compared to the convex clustering model for
clustering original data with dimension $d$, we prove that, under some mild
conditions, the perfect recovery of the cluster membership assignments of the
convex clustering model, if exists, can be preserved by the randomly projected
convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$,
where $0 < \epsilon < 1$ is some given parameter. We further prove that the
embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is
independent of the number of data points. Extensive numerical experiment
results will be presented in this paper to demonstrate the robustness and
superior performance of the randomly projected convex clustering model. The
numerical results presented in this paper also demonstrate that the randomly
projected convex clustering model can outperform the randomly projected K-means
model in practice.
( 2
min )
The maximum likelihood method is the best-known method for estimating the
probabilities behind the data. However, the conventional method obtains the
probability model closest to the empirical distribution, resulting in
overfitting. Then regularization methods prevent the model from being
excessively close to the wrong probability, but little is known systematically
about their performance. The idea of regularization is similar to
error-correcting codes, which obtain optimal decoding by mixing suboptimal
solutions with an incorrectly received code. The optimal decoding in
error-correcting codes is achieved based on gauge symmetry. We propose a
theoretically guaranteed regularization in the maximum likelihood method by
focusing on a gauge symmetry in Kullback -- Leibler divergence. In our
approach, we obtain the optimal model without the need to search for
hyperparameters frequently appearing in regularization.
( 2
min )
submitted by /u/currentscurrents
[link] [comments]
( 45
min )
Lightning AI released Lit-LLaMa: an architecture based on Meta’s LLaMa but with a more permissive license. However, they still rely on the weights trained by Meta, which have a license restricting commercial usage.
Is developing the architecture enough to change the license associated with the model’s weights?
submitted by /u/murphwalker
[link] [comments]
( 47
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 42
min )
submitted by /u/farraway45
[link] [comments]
( 49
min )
submitted by /u/faxfrag
[link] [comments]
( 42
min )
Great listen discussing AGI and ChatGPT
submitted by /u/acatinasweater
[link] [comments]
( 42
min )
submitted by /u/fignewtgingrich
[link] [comments]
( 42
min )
This is a guest post by Neslihan Erdogan, Global Industrial IT Manager at HAYAT HOLDING. With the ongoing digitization of the manufacturing processes and Industry 4.0, there is enormous potential to use machine learning (ML) for quality prediction. Process manufacturing is a production method that uses formulas or recipes to produce goods by combining ingredients […]
( 11
min )
On November 30, 2021, we announced the general availability of Amazon SageMaker Canvas, a visual point-and-click interface that enables business analysts to generate highly accurate machine learning (ML) predictions without having to write a single line of code. With Canvas, you can take ML mainstream throughout your organization so business analysts without data science or […]
( 7
min )
The United Nations (UN) was founded in 1945 by 51 original Member States committed to maintaining international peace and security, developing friendly relations among nations, and promoting social progress, better living standards, and human rights. The UN is currently made up of 193 Member States and has evolved over the years to keep pace with […]
( 9
min )
With further development, the programmable system could be used in a range of applications including gene and cancer therapies.
( 8
min )
Announcements New Books and Courses Explore Synthetic Data, ML Strategies MLtechniques released two new books recently. The first one, version 4.1 now, deals with synthetic data. This updated version includes a chapter on GAN (generative adversarial networks), with a comparison to more traditional methods such as copulas. Applied to real-life datasets, the author discusses the… Read More »DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies
The post DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies appeared first on Data Science Central.
( 19
min )
Blender, the world’s most popular 3D creation suite — free and open source — released its major version 3.5 update. Expected to have a profound impact on 3D creative workflows, this latest release features support for Open Shading Language (OSL) shaders with the NVIDIA OptiX ray-tracing engine.
( 7
min )
Tools like ChatGPT have awakened the world to the potential of generative AI. Now, much more is coming. On the latest episode of the NVIDIA AI Podcast, Yves Jacquier, executive director of Ubisoft La Forge, shares valuable insights into the transformative potential of generative AI in the gaming industry. With over two decades of experience Read article >
( 5
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
We revisit the Gaussian process model with spherical harmonic features and
study connections between the associated RKHS, its eigenstructure and deep
models. Based on this, we introduce a new class of kernels which correspond to
deep models of continuous depth. In our formulation, depth can be estimated as
a kernel hyper-parameter by optimizing the evidence lower bound. Further, we
introduce sparseness in the eigenbasis by variational learning of the spherical
harmonic phases. This enables scaling to larger input dimensions than
previously, while also allowing for learning of high frequency variations. We
validate our approach on machine learning benchmark datasets.
( 2
min )
The estimation of the generalization error of classifiers often relies on a
validation set. Such a set is hardly available in few-shot learning scenarios,
a highly disregarded shortcoming in the field. In these scenarios, it is common
to rely on features extracted from pre-trained neural networks combined with
distance-based classifiers such as nearest class mean. In this work, we
introduce a Gaussian model of the feature distribution. By estimating the
parameters of this model, we are able to predict the generalization error on
new classification tasks with few samples. We observe that accurate distance
estimates between class-conditional densities are the key to accurate estimates
of the generalization performance. Therefore, we propose an unbiased estimator
for these distances and integrate it in our numerical analysis. We empirically
show that our approach outperforms alternatives such as the leave-one-out
cross-validation strategy.
( 2
min )
Images generated by high-resolution SAR have vast areas of application as
they can work better in adverse light and weather conditions. One such area of
application is in the military systems. This study is an attempt to explore the
suitability of current state-of-the-art models introduced in the domain of
computer vision for SAR target classification (MSTAR). Since the application of
any solution produced for military systems would be strategic and real-time,
accuracy is often not the only criterion to measure its performance. Other
important parameters like prediction time and input resiliency are equally
important. The paper deals with these issues in the context of SAR images.
Experimental results show that deep learning models can be suitably applied in
the domain of SAR image classification with the desired performance levels.
( 2
min )
There is considerable evidence that machine learning algorithms have better
predictive abilities than humans in various financial settings. But, the
literature has not tested whether these algorithmic predictions are more
rational than human predictions. We study the predictions of corporate earnings
from several algorithms, notably linear regressions and a popular algorithm
called Gradient Boosted Regression Trees (GBRT). On average, GBRT outperformed
both linear regressions and human stock analysts, but it still overreacted to
news and did not satisfy rational expectation as normally defined. By reducing
the learning rate, the magnitude of overreaction can be minimized, but it comes
with the cost of poorer out-of-sample prediction accuracy. Human stock analysts
who have been trained in machine learning methods overreact less than
traditionally trained analysts. Additionally, stock analyst predictions reflect
information not otherwise available to machine algorithms.
( 2
min )
Estimating the generalization performance is practically challenging on
out-of-distribution (OOD) data without ground truth labels. While previous
methods emphasize the connection between distribution difference and OOD
accuracy, we show that a large domain gap not necessarily leads to a low test
accuracy. In this paper, we investigate this problem from the perspective of
feature separability, and propose a dataset-level score based upon feature
dispersion to estimate the test accuracy under distribution shift. Our method
is inspired by desirable properties of features in representation learning:
high inter-class dispersion and high intra-class compactness. Our analysis
shows that inter-class dispersion is strongly correlated with the model
accuracy, while intra-class compactness does not reflect the generalization
performance on OOD data. Extensive experiments demonstrate the superiority of
our method in both prediction performance and computational efficiency.
( 2
min )
Neural operators have emerged as a powerful tool for solving partial
differential equations in the context of scientific machine learning. Here, we
implement and train a modified Fourier neural operator as a surrogate solver
for electromagnetic scattering problems and compare its data efficiency to
existing methods. We further demonstrate its application to the gradient-based
nanophotonic inverse design of free-form, fully three-dimensional
electromagnetic scatterers, an area that has so far eluded the application of
deep learning techniques.
( 2
min )
We consider a Multi-Armed Bandit problem in which the rewards are
non-stationary and are dependent on past actions and potentially on past
contexts. At the heart of our method, we employ a recurrent neural network,
which models these sequences. In order to balance between exploration and
exploitation, we present an energy minimization term that prevents the neural
network from becoming too confident in support of a certain action. This term
provably limits the gap between the maximal and minimal probabilities assigned
by the network. In a diverse set of experiments, we demonstrate that our method
is at least as effective as methods suggested to solve the sub-problem of
Rotting Bandits, and can solve intuitive extensions of various benchmark
problems. We share our implementation at
https://github.com/rotmanmi/Energy-Regularized-RNN.
( 2
min )
This paper presents a framework for training an agent to actively request
help in object-goal navigation tasks, with feedback indicating the location of
the target object in its field of view. To make the agent more robust in
scenarios where a teacher may not always be available, the proposed training
curriculum includes a mix of episodes with and without feedback. The results
show that this approach improves the agent's performance, even in the absence
of feedback.
( 2
min )
Quantitative characterizations and estimations of uncertainty are of
fundamental importance in optimization and decision-making processes. Herein,
we propose intuitive scores, which we call certainty and doubt, that can be
used in both a Bayesian and frequentist framework to assess and compare the
quality and uncertainty of predictions in (multi-)classification decision
machine learning problems.
( 2
min )
We propose a model to forecast large realized covariance matrices of returns,
applying it to the constituents of the S\&P 500 daily. To address the curse of
dimensionality, we decompose the return covariance matrix using standard
firm-level factors (e.g., size, value, and profitability) and use sectoral
restrictions in the residual covariance matrix. This restricted model is then
estimated using vector heterogeneous autoregressive (VHAR) models with the
least absolute shrinkage and selection operator (LASSO). Our methodology
improves forecasting precision relative to standard benchmarks and leads to
better estimates of minimum variance portfolios.
( 2
min )
We revisit the Gaussian process model with spherical harmonic features and
study connections between the associated RKHS, its eigenstructure and deep
models. Based on this, we introduce a new class of kernels which correspond to
deep models of continuous depth. In our formulation, depth can be estimated as
a kernel hyper-parameter by optimizing the evidence lower bound. Further, we
introduce sparseness in the eigenbasis by variational learning of the spherical
harmonic phases. This enables scaling to larger input dimensions than
previously, while also allowing for learning of high frequency variations. We
validate our approach on machine learning benchmark datasets.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. This transformation will improve the
statistical performance of WDRO because the adjusted WDRO estimator is
asymptotically unbiased and has an asymptotically smaller mean squared error.
The adjusted WDRO will not mitigate the out-of-sample performance guarantee of
WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator
are presented, and the procedure for the computation of the adjusted WDRO
estimator is given. Specifically, we will show how the adjusted WDRO estimator
is developed in the generalized linear model. Numerical experiments demonstrate
the favorable practical performance of the adjusted estimator over the classic
one.
( 2
min )
We are in the age of AI. I was wondering if there are any projects on the horizon of subtitle converters that can take in an image-based subtitle like VobSub and HDMV PGS and then turn it into SRT subs?
Microsoft just recently released a highly impressive OCR model that was trained on 558 million parameters, named TrOCR-LARGE.
TrOCR is a model that uses an image Transformer encoder and an autoregressive text Transformer decoder to perform optical character recognition (OCR). It is pre-trained in 2 stages before being fine-tuned on downstream datasets.
Study of Microsoft's TrOCR
Hugging Face Documentation
GitHub Source Code and code direct from Microsoft
submitted by /u/objectivelywrongbro
[link] [comments]
( 43
min )
submitted by /u/tlubz
[link] [comments]
( 42
min )
submitted by /u/StevenVincentOne
[link] [comments]
( 42
min )
submitted by /u/BrosephSmithSr
[link] [comments]
( 42
min )
submitted by /u/wgmimedia
[link] [comments]
( 46
min )
Predictive maintenance is a data-driven maintenance strategy for monitoring industrial assets in order to detect anomalies in equipment operations and health that could lead to equipment failures. Through proactive monitoring of an asset’s condition, maintenance personnel can be alerted before issues occur, thereby avoiding costly unplanned downtime, which in turn leads to an increase in […]
( 10
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )